Journal of 


Experimental Psychology 


ARTHUR W. MELTON, Editor 
Am Force Personne. anv Trarninc Researcu Cenrer 
LacxLanp Aim Force Base 
San Antonio, Texas 


CONSULTING EDITORS 


G. H AF Personnel and 
“~ umphreys, Treta- 
Arthur L. Irion, Tulane University 
Donald B. , Univer. 

= Lindsley sity of California, 
Neal E. Miller, Yale University 
Kenneth W. Spence, State University of Iowa 
Benton J. Underwood, Northwestern University 
Delos D. Wickens, Ohio State University 


Lorraine Bouthilet, Managing Editor 





CONTENTS 
Verbal Concept Learning as a Function of Instructions and Dominance Level: 
B, J. Unpzrwoop anp J. Ricmanpson 229 
Concept Learning with Differing Sequences of Instances: K. H. Kurtz ann C. I. Hoviamp 239 


and Nonreversal Shifts in Card-Sorting Tests with Two or Four Sorting Cate- 
gories: H. H. Kawprzn ann M. S. Marzuea, Jn. 


Prediction of Response in Verbal Habit Hierarchies: L. R. Perzason 
Effect of Anxiety, Motivational Instructions, and Failure on Serial Learning: I. Sarason 253 


gr een ot Dedien fe op Sapnnied Ketgeet’ Snation: 
F. W. Inwin, W. A. S. Sutrn, ann J. F. Mayvretp 261 


Familiarity and Recognition of Nonsense Shapes: M. D. Arnou.t 


Time and Intensity as Determiners of Perceived Shape: 
H. Lamowrz anp L. E. Bouawe, Ja. 277 


ere ee eee ot Segetee eewe feo Meteerts Dingtey Caneel Putwns 
T. Katcumasn, S. Ross, any T. G. Anvaxws 282 


Latent Learning in the Three-Table Apparatus: W. F. Oaxzs 
Note on “Wishing with Dice”: L. G. Humruazys 





American Psychological Association 
Vol. 51 No. 4 | April 1956 





JOURNAL OF EXPERIMENTAL PSYCHOLOGY 


American 

cone, “eae 
This J 

ate 


doctoral 
Laster sgt oo gy 


Articles are published in the order of their 
stances. Authors are su eal its wi 
of the cost of an author's alterations in 
Priority in tion is to articles whose 
cost of tion. In 19 rte se sas smu te 
Authors of priority publications receive no free offprints. 


Address all articles submitted for regular or eee $e. the 
editor, Arthur W. Melton, Air Force Personnel 
Center, Box 1557, Lackland Air Force Base, San Fie Rog 


uscripts should be submitted in duplicate. 
i Coarete Sate ae gna Sure ae or 


tioned to retain a copy of the manuscript 





Sea 
eddress 6, 4 - 


copies wast be made 


a naa gee uaplianmas analy 





Journal of 








APRIL, 1956 








VERBAL CONCEPT LEARNING AS A FUNCTION OF 


INSTRUCTIONS AND 


BENTON J. UNDERWOOD 


DOMINANCE LEVEL 


AND JACK RICHARDSON 


Northwestern University * 


The task of S in the present research 
is that of learning how objects are 
related. We are concerned with two 
variables that influence the rate at 
which such learning occurs. The 
stimuli used consist of names of com- 
mon objects. When such a stimulus 
is presented to S, it could elicit a 
number of different verbal responses. 
For example, if the stimulus word 
tomato were presented (with appro- 
priate instructions) we might expect 
a series of responses such as “fruit,” 
“red,” “round,” “soup,” “sauce,” and 
80 Thus, each S may have a 
hierarchy of responses to such a 
stimulus, the hierarchy presumably 
reflecting differences in response 
strengths of different responses. Dif- 
ferent Ss may be expected to have 
different hierarchies to some stimuli; 
to others there may be not only high 
agreement among responses elicited 
but the response strengths may be 
highly comparable. 

In the present task S is presented 
with names of four common objects 
and he is to discover what single 
characteristic can be used to describe 
all four of them. For example, brick, 


on. 


! This work was done under Contract N7onr 
45008, Project NR 154-057, between North- 
western University and the Office of Naval 
Research. 


cherry, tomato, and lips can all be 
described as “red.” We have dis- 
cussed elsewhere (2) the rather simple 
proposition that concept learning (as 
described above) would occur more 
quickly the stronger the common 
descriptive response to the stimuli 
Thus, if “red’’ were high in the 
hierarchy of responses to each of these 
four stimuli, learning should take 
place more rapidly than if it were low 
in the hierarchy. In the present 
study the strength of a given response 
to a stimulus, relative to the strength 
of other responses, is called response 
dominance. An arithmetic definition 
of dominance of response to each 
stimulus used will be given in the 
procedure In the experi- 
ments to be reported, the effect of 
three levels of response dominance is 
studied. 

As suggested above, with relatively 
unrestricted instructions a number of 
different of descriptive 
sponses would be elicited by verbal 
stimuli. That 
ation instructions, S might give op- 


section. 


classes re- 


is, with free-associ- 
posites, functional associations, classi- 
ficatory responses, synonyms, and so 
on. For compelling practical reasons 
(which are too detailed to set forth 


here) we have found it necessary to 


229 





230 


restrict sharply the class of descriptive 
responses by which learning of re- 
lationships among objects is studied. 
More specifically, we are studying the 
learning of relationships among ob- 
jects when these relationships are 
based on what we call sense impres- 
sions. By this we mean relationships 
based on rather immediate sense data. 
Thus, the descriptive characteristics 
refer to size, shape, color, smell, feel, 
and so on. Since our concept-learn- 
ing task is limited to a particular ciass 
of similarities among objects, it be- 
comes quite apparent that the nature 
of the instructions to S would be an 
important variable. At one extreme 
we could actually give S the specific 
characteristics by which the objects 
go together. At the other we could 
simply tell S what concepts are and 
let him discover completely how the 
objects are related (according to our 
scheme). In spite of the obviousness 


of this instructional variable, we have 
manipulated it in the present study 


for three reasons. First, a demon- 
stration of the obvious has never been 
made. Secondly, we wanted to know 
something about the magnitude of 
differences in learning speed produced 
by different instructions so that sub- 
sequent experiments could be planned 
more intelligently. Finally, with un- 
restricted instructions, we can study 
the process by which S discovers the 
particular class of responses required 
by the task as well as the particular 
responses within the class. 

In summary, the present research 
is a study of concept learning as a 
function of (a) three different domi- 
nance levels of responses to verbal 
stimuli, and (b) three different sets of 
instructions which reflect different 
amounts of information concerning 
the nature of the concept to be 
learned. 


BENTON J, UNDERWOOD AND JACK RICHARDSON 


PROCEDURE 


Materials.—The words used as stimuli were 
concrete nouns. They were taken from a list of 
213 words which had been subjected to a scaling 
procedure to provide a measure of dominance 
level of various responses elicited by each 
stimulus. Details of the scaling procedure have 
been presented elsewhere (3), so only the critical 
points will be presented here. In the final pro- 
cedure the nouns were presented, one at a time, 
to 153 Ss under instructions to give the first 
sense-impression response which occurred to 
them upon seeing the stimulus word. The de- 
lineation of what was meant by a sense impres- 
sion was accomplished through instructions 
followed by a series of practice words. When 
Ss were familiar with the class of associations 
required, additional practice words were pre- 
sented at an increasing rate of speed until the 
rate reached 6 sec. per word, the rate used for 
the experimental words. 

The 153 Ss were run as seven groups, varying 
in size from 13 to 30. Each group had a different 
randomized order of the words in an effort to 
avoid bias resulting from response sets. The 
words were flashed on the screen and simul- 
taneously pronounced. The S was instructed 
to write down the first sense impression which 
occurred to him. At the end of the 6-sec. 
interval another word was flashed, and so on. 
To each word, then, we had a maximum of 153 
responses, although we did not always get a 
response from all Ss within the short interval 
allowed. Responses to each word were then 
categorized so that responses having the same 
general meaning, e.g., “small,” “little,” “tiny,” 
were all put in the same category. The per- 
centage of ‘total responses in each category was 
determined. Categories for a response were 
maintained if 5% or more of the total frequency 
fell in that category; otherwise, the responses 
were placed in a miscellaneous category. An 
example may make this clear. To the stimulus 
word cigar, 152 responses were given. Of this 
total 61 (40%) said “smelly” (or equivalent), 
40 (26%) said “brown,” and 21 (14%) said 
“long.” A total of 19 (12%) responses fell into 
the “miscellaneous” category. Dominance level 
of responses is defined directly in terms of per- 
centage frequency of a particular category of 
response. In the above example “smelly” is the 
most dominant and “long” the least dominant 
(of those given by 5% or more of the Ss). 

It can be seen that for each of the 213 words 
we have the frequency with which a particular 
sense-impression description was used by Ss. It 
can be seen, furthermore, that somewhat diverse 
stimuli may be described by the same sense 
impression. For example, the response “soft” 





VERBAL CONCEPT LEARNING 


was given to 35 different stimuli with 5% or 
greater frequency. Thus, this response was 
given to banana, bread, belly, flannel, jellyfish, 
lips, custard, moccasin, pillow, sheep, skin, and 
soon. In short, all of these objects are related 
because they were all described as “soft.” In 
the concept-learning task we have chosen a 
limited number of such stimuli, and S is to 
discover how they are related. 

Construction of lists. —The complete list of 213 
words and responses to them have been pub- 
lished elsewhere (3). There are a number of 
rules we followed in constructing the lists for 
studying concept recognition. Some of these 
rules were purely arbitrary, while others were 
required in order to isolate the dominance vari- 
able. We chose first to work with six different 
concepts, namely, round, small, white, smelly, 
soft, and big. Each of these concepts was repre- 
sented at what we will call high, medium, and 
low dominance levels. Thus, there is a total of 
18 concepts on which we have learning data. 

To make the task a reasonable one we arbi- 
trarily decided that a single list would consist of 
24 nouns, representing six concepts. Each con- 
cept would have four examples (instances, il- 
lustrations), and S was to learn to respond to 
these four examples in the same way. ‘Three 
independent lists of 24 nouns each were con- 
structed. Since six independent concepts were 
involved in each list, our 18 concepts are ac- 
counted for. 

Certain considerations led us to vary domi- 
nance level within each list rather than to have 
a separate list for each of the three levels of 
dominance. For each noun we knew the fre- 
quency of various responses. However, a noun 
having a highly dominant response would have 
only a few distracting or interfering tendencies, 
and these should be weak, relatively. On the 
other hand, a response to a noun which we were 
using as an example of a concept of low domi- 
nance more very 
distracting or interfering tendencies. In short, 
number and strength of interferences and 
dominance level are inversely related. Now it 
would seem that if dominance level as defined 
here was directly related to concept-learning 
rate, the explanation could be based on the fact 
that high-dominant concepts had less inter- 
ference than did low. Indeed, this is essentially 
our interpretation of the results. However, we 
could not place all the high-dominant concepts 
in one list, the medium in another, and the low 
in a third, and expect to get a clear interpre- 
tation. It will be remembered that the response 
given by S in the scaling procedure were sense 
impressions, and many nouns elicited the same 
sense impressions. This commonality in re- 
sponse was necessary in order to obtain the 


would have one or strong 


231 


materials for the concept-learning studies. But, 
if we chose a given noun for a low-dominance 
list, responses to it other than the one we were 
calling correct may be “correct” for another 
concept in the list. For example, assume word 
A elicits response X with 15% frequency and 
response Y with 70% frequency. Assume fur 
ther that we use A in a list as an example of a 
low-dominant concept X. We also choose 
another word, B, to which the correct concept 
response is Y. Thus, Y will be correct for one 
word and incorrect for another. We speak of 
this as overlap of response tendencies among 
concepts within a list. In our materials we 
cannot avoid this completely ; therefore, we have 
to equalize this overlap. Such equalization 
cannot be accomplished if we put all of the high 
dominant concepts in one list and low in another. 

In each list we have four instances of each of 
two high-dominant concepts, two of medium, 
and two of low. Each level of dominance for a 
given concept appears in a different list. ‘Thus, 
four instances of high-dominant 
response appear in one list, four instances of 
round as a medium-dominant response in an 
other, and four instances of round as a low 
dominant response in a third. Since no § 
learned more than one list, we have no biases 
based on familiarity. The words used in each 
of the three lists appear in Table 1. In Table 1} 


round as a 


we have grouped the words forming a concept 
although they were not presented to S in this 


manner. We have also indicated the concept 
involved and the mean dominance level 

Differences in dominance level can be seen 
by reference to Table 1. ‘To each word in Table 
1 we know the frequency with which sense 
impression responses have been elicited. ‘To the 
stimulus barrel 72% of the Ss said “round”; to 
doughnut 70% said “round,” to knob 68%, and 
to balloon 55%. The average of these four 
values is 66% as entered in the table. These 
values may be contrasted with those for the low 
dominance as seen in List 3. ‘To the word snail 
the response “round” occurred 14% of the time 
to cherry 14%, to grapefruit 12%, and to skull, 
11%. The average is 13%. 

We have said that level of an 
arbitrarily chosen response to a noun and fre 
quency and strength of other responses (having 
at least 5% frequency) to that noun are inversely 
related. The frequency measures show that for 
all six high-dominant concepts combined there 
were 25 other responses given with greater than 
5% frequency. For the medium concepts the 
value is 48, and for the low, 53. The relation 
ship apparently is not linear when expressed in 
frequency terms 


dominance 


However, the strength meas 
ures give somewhat greater linearity. If we 
take the average strength 


(average percent 





BENTON J. UNDERWOOD AND JACK RICHARDSON 


TABLE 1 


Concert-Leagnine Lists with Concert anp Averace Dominance Levet Inpicatep 








Litt I 


Noun 


Barrel 
Doughnut 
Knob 
Balloon 
Village 
Minnow 
Crumb 
Germ 
"Bone 
Collar 
Frost 
Lint 
Garlic } 
Gasoline 
Pine 
Sulphur 
Custard | 
Lips 
Moss 


Sheep 


Camel 
Forest 
Hospital 


Limousine 


List 2 





Concept and 


Dominance 


Level 


“round” 
high 
(66%) 
“small” 
high 
(75%) 
“white” 
medium 


(37.5%) 


“smelly” 
medium 


(51%) 


“soft” 
low 


(21.5%) 


“big” 
low 


(15%) 


Noun 


aro 
Derby 
Platter 
Pill 
Denasiow 
Capsule 
Mouse 
Pollen 


Baseball 


Daffodil 
Goat 

Gym 
Sauerkraut 
ently 
Chamois 

Fur 

Pillow 
Auditorium 
City 

Elephant 


Mansion 














frequency) of other responses, we obtain the 
values of 8.3%, 19%, and 31% for high, medium, 
and low concepts, respectively. 

Three other characteristics of the lists con- 
structed must be indicated. We have kept the 
total number of other responses fairly constant 
for the three lists, the values being 44, 43, and 39 
for the three lists in order. The equalization of 


Concept and 
Dominance 
Level 


Noun 


Snail 
“round” 


Cherry 


medium 
Grapefruit 


(31%) 
Skull 


(13%) 


Earthworm 
“small” “small” 
Closet 


low 


(16%) 


medium 


(45%) 


Freckle 
Tack 
Milk 
Chalk 


“white’ 
high 


(76.5%) 


Snow 


Teeth 


Ether 

“smelly” 
Garbage 
high 


(74.5%) 


Gardenia 


Manure 


Bread 


“soft” 
Flannel 
medium 


(42%) 


Jellyfish 
Moccasin 
~ Boulder: 
“big”? 
high 


(80.5%) 


“big” 
Gorilla 
medium 
Ocean 








interference tendencies among concepts, as 
discussed earlier, was fairly successful, there 
being 8 instances of overlap in List 1, 10 in List 
2, and 8 in List 3. Finally, in choosing the words 
for a given concept from the master list, we 
allowed no common other responses among the 
four words. Thus, while village, minnow, crumb, 
and germ all had the common description of 





VERBAL CONCEPT LEARNING 


“small” given to them, no other common de- 
scriptive terms applied to even two of the four 
words. 

In summary: there are three lists, each list 
having six concepts to be learned, two each of 
high, medium, and low dominance. ‘Taking all 
six high-dominant concepts into account, the 
average dominance level is 74.8%; for medium, 
40.7%, and for low, 16.1%. 

Presentation of the lists.—The lists were pre- 
sented at a 4-sec. rate on a Hull-type memory 
drum. Three different orders of each list were 
used, being random except for the restriction 
that at least one other word must separate the 
presentation of two words which were examples 
of the same concept. A 4-sec. interval occurred 
between each presentation of the list. The lists 
were presented for 20 trials irrespective of per- 
formance attained. 

Most concept studies in the past have re- 
quired S to give some neutral response to indicate 
concept learning. Thus, a nonsense syllable 
might be used as an indicator that S knew the 
concept. A recent study (1) has shown that 
such a procedure may add a very large com- 
ponent of rote learning to the task. That is to 
say, S may discover rather quickly how the 
stimuli “go together” but it may take him 
considerable time to learn the nonsense syllable 
which is used to indicate that he has learned how 
they go together. In order to avoid this rote 


component we had S respond with the concept 


name itself. For example, to the four stimuli 
village, minnow, crumb, and germ, he would 
respond with “small” if he correctly recognized 
the concept. 

Instructions to S indicated that he was to 
give a response to each word during the 4-sec. 
interval even on the first trial. After each 
response E said “right” or “wrong.” All re- 
sponses were recorded. The £ used reasonable 
flexibility in deciding whether a response was 
right or wrong. Preliminary studies, however, 
allowed us to set up responses which were to be 
called “right” so that it was rare that E had to 
exercise judgment at the time S responded. 
The basic criterion, of course, for a response 
being called “right” was that the response given 
must reflect the characteristic required. 

Subjects.—A total of 144 Ss served in the 
experiment. These Ss were taking elementary 
psychology and on this characteristic were com- 
parable to the 153 used originally to scale the 
words. The 144 Ss were divided into three 
groups of 48 each. Each of the three groups 
received different instructions (see below). 
Within each group of 48% there were three sub- 
groups of 16 Ss, each subgroup being presented 
with a different list. The Ss were assigned in 
rotation to the lists in the order of their appear- 
ance at the laboratory. That is to say, the first 


233 


S had List 1, the second List 2, the third List 3, 
the fourth List 1, and so on. It should be 
mentioned again that lists as such were not 
intended to be a variable. Our intent was to 
construct lists of as near equal difficulty as 
possible. The use of three lists allowed us to 
test the influence of dominance level for a single 
concept since all three levels of a single concept, 
e.g., “round,” were represented when all three 
lists are considered. 

Instructions. —Each of the three groups of 48 
Ss received instructions which differed in certain 
critical aspects. ‘These three groups of Ss were 
not run concurrently and we do not know that 
they are comparable in ability. However, all 
came from the courses in elementary psychology 
under the same system for obtaining Ss and there 
is no reason to expect large sampling differences. 
The differences in performance as a function of 
instructions are so great that we do not think it 
is likely that these differences can be attributed 
exclusively to sampling bias. 

The three sets of instructions differed as to 
the amount of information given S concerning 
the nature of the concept. First, however, we 
will give the essential points in the instructions 
which were common to all three groups. ‘The 
following four points give the instructions in 
abbreviated form. 


1. There are 24 nouns on the tape; four of 
these can be described by the same word, four 
others by a different word, and so on. 

?, Each noun will be presented for 4 sec. and 
you must give a response to each one within the 
4-sec. period. I will tell you “right” or “wrong” 
after each response. 

3. We will go through the list time after time 
and you are to give a response to each word 
every time it comes up even though you know 
the response is wrong or even if you have been 
getting it right consistently. 

4. If you learn the list completely you will be 
responding with only six different words since 
each response is correct for four of the nouns 
These are six distinctly different words and are 
not in any sense synonyms. 


The unrestricted instructions (UR) told S 
nothing about the nature of the concepts to be 
learned. The Ss were instructed to respond ir 
a free-association fashion to the words on the 
first trial. They were further told that it would 
be a good idea to vary their responses on the 
second trial from those given on the first, on the 
third from those on the second, and so on, until 
they started to get some responses correct. 

In the partially restricted (PR) instructions 
Ss were told the class of responses needed to 
form the concepts. It is probably more correct 
to say that S arrived at an understanding of the 
class through appropriate questioning by &. 





BENTON J. UNDERWOOD AND JACK RICHARDSON 





MEAN CORRECT RESPONSES 











MEDIUM HIGH 


OOMINANCE 


Fic. 1. Mean total correct responses over, 
20 trials as a function of dominance level and 
instructions. UR refers to unrestricted in- 
structions, PR, partially restricted, and CR, 
completely restricted, 


The S was asked about simple ways of describing 
common objects. The E prodded S on this 
matter until S gave sense-impression descrip- 
tions. Furthermore, we did not allow S to give 
only one or two such sense impressions; rather, 
we forced him to give a number of these so that 
a set for a particular type of sense impression 
did not occur. From this procedure, pre- 
sumably, S knew the class of responses needed 
but did not know the particular responses within 
the class. It can be seen that these Ss were 
instructed in the same manner as were Ss who 
originally scaled the words. 

In the completely restricted (CR) instructions 
S actually was given the six responses which 
were correct. He was allowed to study these 
until he could repeat them. Furthermore, he 
was given the card on which the six responses 
were printed and kept it in front of him during 
concept learning so that he could refer to it if 
necessary. 


Resutts AND Discussion 


Over-all performance.—All Ss were 


given 20 trials. ‘The first perform- 
ance measure to be considered is the 
mean total correct responses given on 
all 20 trials for the nine conditions 
(three dominance levels against three 
sets of instructions). It will be re- 
membered that three different lists of 
concepts were involved, each list 
having two concepts at each domi- 
nance level. For this analysis per- 


formance scores for all three lists have 
been combined so that at each domi- 
nance level there are six different 
concepts involved. 

Figure 1 shows the influence of the 
two variables. As dominance level 
increases, performance (number of 
correct responses) increases. And, 
the greater the amount of information 
given S by instructions, the better the 
performance. The statistical analysis 
has provided some difficulty. Vari- 
ance within each instructional level 
UR and PR is homogeneous, but 
differs so much between instructional 
levels that we have found no way to 
adjust for this. Dominance within 
each instructional level is highly 
significant. However, there is a sug- 
gestion in Fig. 1 that as more in- 
formation is given S by instructions, 
the influence of dominance level is 
reduced. Because of the great heter- 
ogeneity of variance resulting from 
different instructions, we have found 
no satisfactory way to evaluate sta- 
tistically the suggested trends toward 
interaction. Ignoring heterogeneity, 
the F falls far short of significance. 

First-trial performance.—Probably 
the clearest way of assessing the 





Na 


& 


“a 


nN 





o 








MEAN CORRECT RESPONSES 





weolum 
DOMINANCE 


Low 


Fic. 2. Mean total correct responses on 
Trial 1 as a function of dominance level and 
instructions. 





VERBAL CONCEPT LEARNING 


validity of our dominance measure 
is to examine performance on the first 
trial. On this trial, within the limits 
imposed by the instructions, the Ss 
are responding most clearly on the 
basis of strength of response tend- 
encies which they “brought” to the 
laboratory. That is, the differential 
reinforcement procedure has less in- 
fluence on this first trial than on 
subsequent trials. The results for 
the first trial are shown in Fig. 2, 
where it is again clear that both 
variables are highly effective. 

Concept attainment.—The data pre- 
sented thus far give no direct evidence 
on attainment of a particular concept. 
That is, nothing has been asserted 
about the rate of acquisition of all 
four examples defining a given con- 
cept. Since the basic results when 
viewed in this manner are essentially 
the same as those presented above, 
and since the trends are quite com- 
parable for the three sets of instruc- 
tions, we are presenting the data for 
all three instructions combined. The 
data are shown in Table 2. We have 
determined on which trial the correct 
response was given to all four ex- 
amples of a concept, and have defined 
this trial as the point of concept 
attainment. Pooling the data for all 
Ss, we have indicated the number of 
times the concept was attained on 
Trials 1-5, 6-10, 11-15, and 16-20. 
There is a final column indicating the 
number of concepts not attained in 
the 20 trials. There are 144 Ss, each 
S being presented two concepts at 
each dominance level for a total of 
288. Table 2 shows that as domi- 
nance level increases, number of con- 
cepts attained on Trials 1-5 increases 
and number of concepts not attained 
in the 20 trials decreases. In general, 
then, this measure of performance 
shows quite the same picture as did 
measure relating number of correct 
responses to dominance level. 


TABLE 2 


Numper or Concerts ATTAINED oN 
Successive Biocxs or Five Triaus 
ror Eacn Dominance Levet ror 


Aut Instructions Compinep 


Trials 


| Not 
| | Attained 
| 
| 
| 


nance 
6-10 1-45 16 20 
50 te 
10) i] 


25 13 


Medium 
High 


Dominance level and individual con- 
cepts.—It will be remembered that 
each particular concept, e.g., white, 
was represented at all three levels of 
dominance, each level in a different 
list. The question may be raised as 
to how perfectly related are domi- 
nance level and performance for 
separate With the par- 
tially restricted instructions, instruc- 
tions which are essentially the same 
as those given Ss for the 
procedure from which dominance 
levels were derived, the alignment 
between dominance and learning is 
perfect for all six concepts. ‘That is 
to say, for each of the six concepts 
considered separately, the total cor- 
rect 


concepts. 


scaling 


responses given during the 20 
trials was directly related to domi- 
nance level. For the unrestricted 
instructions and for the completely 
restricted instructions, there 
minor reversals in the 
When the results for all sets 
of instructions are pooled, the ordering 
is perfect, as shown in Fig. 3. In 
each case, fewer correct responses are 
given for 
than for medium, 
medium than for high. Nevertheless, 
it can be that there ap- 
preciable differences among concepts 
at a given level of dominance. The 
implication of this is that there are 
other factors involved in learning in 
addition to dominance level 


were 
ordering. 
three 


low-dominance concepts 


and fewer for 


seen are 


Certain 





BENTON J. UNDERWOOD 


AND JACK RICHARDSON 





[H-HIGH DOMINANCE 
M-MEDIUM DOMINANCE 
/-L -LOW DOMINANCE 4H 


H — 


sist: 
see 
335533 


ererreer re ey 


Slee ee 
Teese t 








F317 WANA = 




















| 4677: 

















SMALL WHITE 


FT 


O| 


SMELLS’ 5 


CONCEPT 


>. 3. Mean total correct responses over 20 trials as a function of 


dominance level for p 


of the responses (sense impressions) 
very likely have higher dominance as 
a descriptive response than do others. 
Also, it is quite likely that our pro- 
cedure of having different levels of 
concepts in the same list introduces a 
source of variance. Thus, while 


articular concepts. 


dominance is a very important vari- 


able in the learning, it does not by 
any means account for all differences 
in rate of acquisition as this has been 
determined here. 

Intralist interference.—We noted 
earlier that our conception of why 





eo UNRESTRICTED INSTRUCTIONS 


oe ~ 
° °° 


Na 
° 


- 
° 


TOTAL RESPONSES 
nN b 
° 


3 








PARTIALLY RESTRICTED 7 
INSTRUCTIONS 
iaof ; 


'60r 


120 


100 


60 


60 


40 














Fic. 4. Interfering responses as a function of dominance level. 





VERBAL CONCEPT LEARNING 


dominance achieves status as an 
important variable is because the 
higher the dominance level the less 
the relative strength of competing 
response tendencies. Within limits 
we know what the frequency and 
strength of these competing response 
tendencies are. We shall present, 
therefore, some evidence of how these 
competing response tendencies appear 
in the learning records. The first 
set of data are shown in Fig. 4, which 
consists of two parts. One part 
concerns the experiment using un- 
restricted instructions, the other the 
partially restricted instructions. The 
first 9 of the 20 trials are shown along 
the baseline. Only these 9 trials are 
included since they show the essential 
facts. The ordinates represent the 
total times a known interfering re- 
sponse occurred. By a known inter- 
fering response we mean a response 
which had been given with 5% or 
greater frequency on the original 
scaling of the stimulus words. 

In the experiment with unrestricted 
instructions it can be seen that the 
interfering responses are initially few 
in number and then increase, followed 
by a decrease. This is contrasted 
with the partially restricted instruc- 
tions where the frequency decreases 
from an initial high. With the par- 
tially restricted instructions Ss were 
told the class of responses which were 
correct; in the unrestricted instruc- 
tions this was a part of the learning 
process and is reflected in the initial 
increase in this class of responses up 
to the fifth or sixth trial. 

The frequency of interfering re- 
sponses as plotted in Fig. 4 are in- 
versely related to rate of acquisition 
as shown in earlier data. For the 
unrestricted-instruction experiment, 
for low dominance, the errors as 
plotted in Fig. 4 were greater in 
frequency than were the correct re- 


237 


TABLE 3 


Ferors as A Function or Percentace 
or Frequency (Strenctrn) 


Number Number 


| Words 
| | 
| 


Mean 
| Errors 


| Per Word 


Per Cent 


Frequency Errors 


5 | 48 
16 ) 78 
Ma 142 
SG . 144, 


sponses through the first six trials. 
Not until the seventh trial did the 
correct responses become more fre- 
quent. With partially restricted in- 
structions the errors were more fre- 
quent than the correct responses for 
low dominance for only two trials. 
It should be clear that the errors to 
which we are referring are those 
responses which occurred with 5% or 
greater frequency in the original 
scaling, but which were incorrect 
responses in the concept-learning lists 
as constructed for this experiment. 
In the unrestricted-instruction experi- 
ment there may other errors 
initially, since the instructions es- 
sentially asked S to give free associ- 
ations. Errors of this 
creased rapidly as S began to learn 
that the class of responses required 
were sense impressions. 


were 


nature de- 


Another way to analyze the relation 
ships among rate of learning, dominance, 
and interference is to determine if fre- 
quency of interfering responses is related 
to strength of interfering tendency. By 
strength we mean the percentage of 
times it given on the original 
scaling. A sample of the findings is 
given in Table 3. The data come from 
examination of errors on the first 9 trials 
of low-dominance concepts for the un- 
restricted-instructions experiment. The 
first column, “Per Cent Frequency,” 
indicates groupings of response strengths 
in terms of percentage of frequency of 
occurrence on original scaling. For ex- 
ample, there were 19 responses which 


was 





238 


occurred between 5% and 15% of the 
time in the scaling. In the present lists 
these are wrong responses. The third 
column, ““Number Errors,”’ indicates the 
total number of times these 19 words 
were given as errors in concept learning. 
The final column gives the mean errors 
based on the number of words of that 
grouping which could produce errors. It 
is quite clear from the last column that 
as strength (percentage of occurrence in 
original scaling) of interfering responses 
increases the greater the number of 
times it is given as an overt error. 

Miscellaneous.—We have made a num- 
ber of other analyses of our data. Most 
of these merely supplement the infor- 
mation already given and will not be 
presented. A few findings are worth 
mentioning briefly. 

The three lists used did not differ 
widely in difficulty. If all three experi- 
ments are combined the mean total 
correct responses was 58.05, 53.00 and 
57.99 for Lists 1, 2, and 3, respectively. 
List 2 appears somewhat deviant, but 
this deviation does not achieve statistical 
significance. 

We have plotted a number of indi- 


vidual acquisition curves for the 20 


trials. There are no easy generalizations 
which can be made about them. Some 
curves showed gradual increments with 
no more than one or two correct re- 
sponses being gained from trial to trial. 
Others showed jumps of four to eight 
correct responses on a single trial. But 
the gradualness and the spurts were not 
related in any systematic fashion to fast 
and slow learning. Jumps in the curves 
occurred more frequently for Ss in the 
partially restricted and restricted groups 
than in the group given unrestricted 
instructions. The group learning curves 
are quite smooth and provide no addi- 
tional information not already presented 
in other ways. 


SUMMARY 


The learning of concepts to verbal stimuli 
was studied as a function of (a) nature of in- 


BENTON J. UNDERWOOD AND JACK RICHARDSON 


structions, and (b) three dominance levels of the 
concepts. Dominance level was determined by 
the frequency of a given restricted association 
elicited by the verbal stimuli in an earlier scaling 
procedure. ‘The greater the frequency of a given 
response, the higher the dominance level. In- 
structions were varied (three ways) in terms of 
amount of information given S concerning the 
nature of the concept to be learned. Each list 
consisted of 24 nouns, these 24 nouns being four 
instances of six concepts. The S was to learn 
that the four nouns “went together” because 
they could be described by a single sense im- 
pression, ¢.g., size, shape, color, etc. The six 
concepts used were large, small, white, soft, 
smelly, and round. Fach concept was used at 
each level of dominance and with each set of 
instructions. A total of 20 trials was given to 
144 Ss with the response measures being number 
of correct responses made in 20 trials and number 
of concepts attained in 20 trials. The stimuli 
were presented at a 4-sec. rate with S responding 
directly in terms of sense impressions, i.e., S 
would respond with “large,” “small,” etc., to be 
correct. The major results were: 


1. The higher the dominance level, the greater 
the number of concepts learned and the greater 
the number of correct responses given in 20 
trials. 

2. The greater the amount of information 
given S concerning the nature of the concepts to 
be learned, the more rapid the acquisition. 

3. Dominance level and number of interfering 
responses were inversely related, and the shape 
of the interfering response curves varied with 
the nature of the instructions. The stronger the 
interfering responses, the more frequently they 
occur in the learning records. 


REFERENCES 


1. Ricnarpson, J., & Bercum, B. Distributed 
practice and rote learning in concept 
formation. J. exp. Psychol., 1954, 47, 
442-446. 

2. Unperwoop, B. J. An orientation for re- 
search on thinking. Psychol. Rev., 1952, 
59, 209-220. 

3. Unperwoon, B. J., & Ricuarpson, J. Some 
verbal materials for the study of concept 
formation. Psychol. Bull., 1956, 53, 
84-95. 


(Received May 9, 1955) 





Journal of Experimental Psychology 
Vol. 51, No. 4, 1956 


CONCEPT LEARNING WITH DIFFERING SEQUENCES 
OF INSTANCES 


KENNETH H. KURTZ AND CARL I. HOVLAND 


Yale University 


Under conditions where several 
concepts are learned concurrently and 
concept instances are presented suc- 
cessively, the instances of any given 
concept may be presented in varying 
degrees of proximity to one another. 
At one extreme these may be pre- 
sented one after the other without 
the interpolation of instances of any 
other concept, and at the other ex- 
treme two instances of a given concept 
may never occur in succession without 
the interpolation of one or more 
instances of other concepts. The 
present investigation concerns the 
rate of concept attainment under 
these two modes of presentation of 
concept instances. 

Theoretical considerations  ad- 
(6) suggest 


vanced by Underwood 
that the first condition in which the 
instances of a given concept are pre- 


sented in close proximity should 
produce more rapid learning. This 
expectation is based upon the as- 
sumption that to abstract the com- 
mon property or properties of several 
concept instances, perceptual, ide- 
ational, or motor representations of 
the properties of these instances must 
occur coutiguously. The implicit 
representation may be either in direct 
response to the presentation of a 
concept instance or recalled from the 
past presentation of an _ instance. 
When two concept instances are pre- 
sented simultaneously, occurrence of 
perceptual representations of the rele- 
vant stimulus properties will depend 
primarily upon factors of set or at- 
tention; when the instances are pre- 
sented successively, the additional 


factor of memory is introduced, so 
that, even though the relevant proper- 
ties of a first instance are perceived 
at the time of presentation, they may 
be forgotten in the period intervening 
before the presentation of a second 
instance. The likelihood of forget- 
ting would be expected to be a 
function of such factors as the com- 
plexity of the original instance, the 
length of the intervening period, and 
the nature of the activities inter- 
polated during the period. 

The conditions employed in the 
present experiment and those 
ployed in a recent study reported by 
Hovland and Weiss (4) may be re- 
garded as bracketing adjacent seg- 
ments on a continuum of conditions 
facilitating to varying degrees the 
contiguous perceptual representation 
of concept instances. In the study 
of Hovland and Weiss, learning with 
simultaneous presentation of concept 
instances was compared with the 
theoretically less favorable condition 
of successive presentation of instances, 
and a significant difference was ob- 
tained in the expected direction. In 
the present study a condition similar 
to the successive condition of the 
Hovland and Weiss study was com- 
pared with the theoretically still less 
favorable condition in which concept 
instances were not only presented 
successively but were also intermixed 
with instances of other concepts. In 
this method of presentation, at the 
time of presentation of an instance 
of a given concept, retention of earlier 
instances of the same concept would 
be expected to be impaired both by 


em- 





KENNETH H. KURTZ AND CARL I. HOVLAND 

















my, 


P c 
HAJ 


_A 
xem" 


Fic. 1. 


fa 


Illustrative concept instances. 


"Fov" 


the longer delay intervening and by 
interference arising from the inter- 
polation of 
concepts.' 


instances of different 


PROCEDURE 


The experimental problem consisted of the 
presentation of several instances of each of four 
concepts, followed by a test of mastery of these 
four concepts. ‘Test performance was compared 
following two different methods of presentation 
of the concept instances. In the first method, 
instances of the four different concepts were 
intermixed so that two instances of any given 
concept were separated by instances of one or 
more of the other concepts; in the second 
method, all the instances of any given concept 
were presented in close succession without the 
interpolation of instances of other concepts. 

The stimulus materials were simple geometric 
patterns which varied in four relevant dichoto- 
mous properties or dimensions (3). The di- 
mensions of variation were shape (circle or 
square), size (large or small), color (black or 
white), and position (up or down). Each of the 
four concepts was defined by a combination of 
two properties and was designated by a dis- 
tinctive nonsense-tyllable name. The names 
and defining properties of the four concepts were 
(a) kem, up and circle; (b) foo, up and square; 
(c) haj, down and small; and (d) yug, down and 
large. Examination of the defining properties 
shows that every possible shape-color-size- 
position combination is an instance of one and 
only one of the four concepts. All the instances 
of a given concept were alike in the defining 
properties, but differed in the remaining proper- 
ties. For example, all kem’s were circles and up, 
but could be large or small, black or white. 

Stimulus mcterials.—One illustrative instance 
of each concept is presented in Fig. 1. Each 
drawing consists of three parts: a 14-in. square 
frame, a smaller square overlapping this frame, 
and either a circle or square completely contained 


! The present problem has been independently 
investigated in an unpublished research by 
Newman (5). The two studies differ con- 
siderably in the type of materials and procedures 
employed and thus complement each other and 
serve to extend the range of conditions investi- 
gated. 


within the frame. The shape, size, color, and 
position of the inner figure determined the four 
concepts. The two different shapes are illus- 
trated by drawings A and B, the sizes by C and 
D, the colors and positions by Band D. A fifth 
dimension involving the placement and shading 
of the overlapping square was irrelevant to the 
concepts. 

The five dimensions, each with two possible 
values, yielded a total of 32 (=25) combinations 
of properties to be used as concept instances. 
The 32 instances were drawn separately on 3X 5- 
in. index cards for presentation during the 
learning procedure. These were divided into 
eight instances of each of the four concepts as 
follows: they were first divided into two groups 
according to whether the inner figure was up or 
down; those on top were further divided into 
hems and fovs according to whether the shape 
was circle or square; those on the bottom were 
divided into haj’s and yug’s according to whether 
they were small or large, respectively. 

Two test packs, each including two instances 
of each concept were also prepared. ‘Together 
these two packs included all 16 instances formed 
by combinations of the two values of each of the 
four relevant dimensions. All these instances 
had the same value of the irrelevant dimensions. 
Within a given test pack the two instances of 
any given concept differed in both of the di- 
mensions which were irrelevant to that particular 
concept. For example, if one kem was large and 
black, the other one in that test pack was small 
and white. 

Preliminary training.—Throughout the ex- 
perimental session E and S sat at a table facing 
one another. Prior to training and testing on 
the experimental problem, each S was given 
practice on a preliminary problem to familiarize 
him with the procedures involved. The figures 
employed on the preliminary problem were 
equilateral triangles with the following vari- 
ations: number (one os. two triangles), position 
of apex (pointed up os. down), pattern (checkered 
os. striped markings), and color of markings 
(black and red os. black and white). The S was 
informed of the nature of these variations, and 
each variation was illustrated by presenting two 
instances differing only in the variation being 
demonstrated. It was explained to each S that 
learning a concept would consist in discovering 
the two properties in which a series of cards were 
all alike, i.e., that S was to be shown a series of 
four cards one at a time and would be required 
to find the common properties. These cards 
were all black and white and checkered. The 
series included all four combinations of two 
values of the two remaining dimensions (position 
of apex and number). The S was asked to 
report verbally what he believed to be the com- 
mon properties. If his answer was correct, E 





CONCEPT LEARNING 


indicated this to S; if S’s answer was incorrect, 
E presented the series again. The presentation 
of the cards and questioning of S was continued 
until S gave the correct answer. When the first 
problem had been solved, the procedure was 
repeated with a second series of four cards having 
in common the colors black and red and the 
inverted position. 

Experimental task.—The experimental prob- 
lem was given immediately following completion 
of the preliminary problem. First, each of the 
four relevant variations was described to S and 
illustrated by a pair of instances differing only 
in the dimension being described. It was ex- 
plained to S that he was to learn four concepts 
and that all the instances of a given concept 
would be called by the same nonsense name. A 
card on which the four nonsense names were 
lettered was placed on the table in S’s view and 
left there throughout the remainder of the 
procedure. It was further explained that the 
instances would be shown one at a time and the 
name of each instance would be given by E as 
presented. The S was told that at the end of 
the learning series he would be asked to describe 
the properties corresponding to each of the 
nonsense names and would be asked to identify 
a series of test instances by their names. 

Two different experimental conditions were 
determined by the method of presentation em- 
ployed. In the first method, instances of all 
four concepts were intermixed so that two 
successive instances of any given concept were 
always separated by instances of one or more 
other concepts. In the second method, all the 
instances of a given concept were presented in 
immediate succession without the interpolation 
of instances of other concepts. The order in 
which the concepts were presented and the order 
of the instances within each concept were varied 
among different Ss. Half of the Ss were ran- 
domly assigned to each of the two methods of 
presentation. Each S was informed whether 
the different concepts would be presented 
separately or intermixed. 

The cards bearing the 32 instances were 
presented manually by £. The training-testing 
sequence was as follows: 


1. Presentation of concept instances. The 
E presented all 32 instances at the rate of 2 sec 
each and pronounced aloud the name of each 
instance as it was exposed. 

2. Verbal description test. The S was re- 
quested to report verbally the common proper- 
ties corresponding to each of the nonsense 
names, and his responses were recorded by E 
without any comment as to whether or not they 
were correct. If S failed to mention any of the 
names, E inquired: “Can you remember the 
properties of the ———'s?” 


3. Identification test. The eight instances 


241 


TABLE 1 


Concert ATrrainment with Mixep anv 
Unmixep Orpers or Presentation 





Trial 


Mixed | Unmixed | P 


1 3.61 | 469 10 
4.38 | 5.54 | 1B 


Scores on Verbal Description Test 


| 
4.54 03 
646 Ol 





in the first test pack were presented one at a 
time and S was requested to identify each by the 
appropriate nonsense name. The S was re- 
quired to guess when not certain, and each card 
was exposed until S responded with a concept 
name. No indication was given as to the cor- 
rectness of S’s responses 

Upon completion of the above procedures, 
the entire sequence was repeated a second time. 
Prior to the second presentation of the concept 
instances, the cards were shuffled so that they 
were not in the same order as on the first trial, 
although they were in the same series. For 
example, if the concepts had been presented in 
an unmixed sequence, then on the second trial 
they were also presented in an unmixed sequence, 
but the order of the various concepts and of the 
instances within each concept was changed. On 
the second identification test, the second test 
pack was substituted for the first. 

Subjects.—The Ss were 26 Yale undergraduate 
men hired through the Student Appointment 
Bureau. 


RESULTS 


Table 1 shows the mean number of 
correct identifications on the first and 
second test for the two methods of 
presentation. It will be seen that on 
both tests there was a slight difference 
in the expected direction, i.e., more 
correct identifications following the 
unmixed order of presentation. On 
neither of the tests, however, was this 
difference statistically significant. 

The verbal description test was 
scored in the following manner: one 
point was given for every property 
correctly ascribed to a given concept 





242 


name. For every concept an S could 
obtain a score of 0, 1, or 2. The over- 
all score given each S was the total 
of the scores on the four different 
concepts, and could vary from 0 to 8. 
The mean over-all scores on the first 
and second test under the two con- 
ditions of presentation are also pre- 
sented in Table 1. On both tests the 
difference is in the expected direction 
of higher scores following the unmixed 
order of presentation. The one-tailed 


p values for the differences on the first 
and second tests were less than .03 
and .O1, respectively, as calculated by 
Wilcoxon’s rank total test (7). 


Discussion 


The results obtained support the 
conclusion that, under the conditions 
studied, the learning of concepts is more 
rapid when positive instances of the same 
concept appear in close succession than 
when they are separated by instances of 
other concepts. At present, the most 
likely interpretation appears to be that 
under the latter condition, upon the 
appearance of a second instance of a 
given concept, memory of a prior in- 
stance of that same concept is subject 
to considerable interference resulting 
from the interpolation of instances of 
other concepts. Owing to this impair- 
ment of memory, implicit representations 
of two instances of a given concept do 
not regularly occur in close contiguity, 
and the difficulty of abstracting common 
properties of two instances is appreciably 
increased.? 


* The descriptive use of the term contiguity 
in the present context must be distinguished 
from the use of the same term as an explanatory 
concept, ¢.g., in “contiguity” theories of learning 
(2). In its latter use the condition of contiguity 
is specified as a necessary and sufficient condition 
for establishing an association between a stimu- 
lus and response; in its present usage the con- 
tiguity of implicit responses (perceptual repre- 
sentations) is specified as a condition which 
facilitates the abstraction of common features of 
two stimulus complexes. A more elaborate 
conceptual and empirical analysis of the mecha- 


KENNETH H. KURTZ AND CARL I. HOVLAND 


The foregoing account has been sim- 
plified by involving only the factor of 
memory of consecutive instances of a 
given concept. Undoubtedly other fac- 
tors are important, and under different 
conditions the relative efficacy of the 
two methods of presentation might be 
altered and even reversed. The results 
might be expected to be influenced by 
the type of criterion used and the degree 
of learning involved. Gagné’s data (1) 
indicate that confusion errors tend to be 
made more frequently in the early por- 
tions of learning when similar stimuli are 
placed in adjacent positions but that this 
leads to better differentiation later and 
superior final performance in learning 
paired associate lists. 

Another consideration would be the 
degree of discriminability between 
stimuli associated with different con- 
cepts. When the degree of discrimi- 
nability is low, it might be expected that 
placing of instances from different con- 
cepts in juxtaposition would facilitate 
discrimination and learning, whereas 
with greater discriminability, like that 
obtaining in the present study, the 
reverse might obtain. This prediction 
might seem to be in contradiction to 
results in the Gagné study already 
mentioned, but it is to be noted that in 
his experiment each stimulus was to be 
associated with a different response, 
whereas in the current situation stimuli 
within a block all involve the same 
response (concept). 

Finally, the effect of grouping in- 
stances may depend upon the general 
manner in which Ss set about to solve 
the problem. Although not much is 
known about the conditions determining 
choice of approach, the present authors 
have observed that Ss differ in the extent 
to which they make use of information 
conveyed by concept instances in formu- 
lating verbal hypotheses about the 
nature of a particular concept. At one 
extreme, some Ss seem to “randomly” 
formulate and test various possible 
hypotheses, while at the other extreme, 
nisms involved in concept learning must be 


developed before the term contiguity in the 
latter sense has great theoretical power. 





CONCEPT LEARNING 243 


some Ss carefully study the concept 
instances presented in an attempt to 
“infer” the common properties, reserving 
the choice of a hypothesis until sufficient 
data are available. In general, fewer 
hypotheses are considered by the latter 
group before arriving at the correct one. 
It seems likely that, among Ss who 
actively attempt to abstract the com- 
mon properties of several instances 
before formulating a hypothesis, the 
unmixed order of presentation would be 
relatively easier than the mixed order, 
but among Ss choosing hypotheses by 
trial and error the difference might be 
considerably reduced. An experimental 
test of this prediction would be provided 
by studying learning under mixed and 
unmixed orders of presentation following 
two different types of instruction and 
pretraining designed to induce different 
methods of solution. 


SUMMARY 


The present study was an exploratory investi- 
gation of the rate of concept attainment under 
two conditions of presentation of concept 
instances. The hypothesis studied was that 
learning would proceed more rapidly under a 
condition in which the instances of a given 
concept were presented one after another without 
interpolation of instances of other concepts as 
compared with a condition in which the instances 
of several concepts were presented in an inter 
mixed order. 

The consisted of geo- 
metrical designs varying in color, size, shape 
and position. Each concept was defined by a 
combination of two properties, e.g., large square, 
or small black object. The Ss were presented 
one at a time with eight instances (each iden 
tified by the same distinctive nonsense name) 
of each of the four concepts. In Cond. I, all 


concept materials 


eight instances of a given concept were presented 
in succession before presenting instances of a 
second concept, etc. In Cond. II, instances of 
all four concepts were presented in an inter 
mixed order so that no two instances of a given 
concept were presented in succession without 
the interpolation of an instance of at least one 
other concept. At the end of this training Ss 
were asked to give a verbal description of each 
concept and to identify several instances of each 
concept by the appropriate nonsense name, 

Following the unmixed order of presentation 
Ss gave both more correct identifications and 
more correct verbal descriptions of the concepts 
than following the mixed order of presentation. 
Only the latter difference was statistically 
significant (P = .03 for Trial 1 and .O1 for Trial 
2, single tail) 


REFERENCES 


1. Gacné, R. M. The effect of sequence of 
presentation of similar items on the 
learning of paired associates. J. exp 
Psychol., 1950, 40, 61-73 

. Hirearn, E. R. Psychologies of learning. 
New York: Appleton-Century-Crofts, 
1948. 

. Hoviann, C. 1. A “communication analy 
sis” of concept learning. Psychol. Ree., 
1952, 59, 461-472. 

. Hoviann, C. L., & Weiss, W. Transmission 
of information concerning concepts 
through positive and negative instances. 
J. exp. Psrychol., 1953, 45, 175-182 

. Newman, S. E. The effects of similarity and 
contiguity on the formation of concepts. 
Unpublished doctor's dissertation, North- 
western Univ., 1951 

. Unperwoop, B. J. An orientation for re- 
search on thinking. Psychol. Ree., 1952, 
59, 209-220. 

. Wircoxon, F. Probability tables for indi 
vidual comparisons by ranking method 
Biometrics, 1947, 3, 119-122 


(Received May 20, 1955) 





Journal of ane Psychology 
Vol. 51, No. 4 


REVERSAL AND NONREVERSAL SHIFTS IN CARD-SORTING 
TESTS WITH TWO OR FOUR SORTING CATEGORIES ! 


HOWARD H. KENDLER AND MARK 8S. MAYZNER, JR. 


New York University 


Typically a person who is con- 
fronted with some rather difficult 
intellectual problem is forced to shift 
from one pattern of responses to 
another until problem solution occurs. 
One obvious consideration confronting 
the psychologist who is interested in 
understanding the problem-solving 
process is to comprehend those prin- 
ciples which determine the transition 
between successive patterns of be- 
havior in a problem-solving situation. 

This problem has been approached 
in a circumscribed fashion by Kendler 
and D’Amato (2). They compared 
the relative effectiveness of a reversal 
shift and a nonreversal shift in a 
multiple-solution card-sorting prob- 
lem. For one group (reversal shift) 
the second concept to be learned was 
the reverse of the first concept, i.e., 
the cues were reversed so that S was 
required to sort the cards in a fashion 
opposite to that demanded initially. 
For the other group (nonreversal 
shift) the second concept was un- 
related to the first concept in the 
sense that the basis of the correct 
sorting was shifted from one stimulus 
dimension to an entirely different one. 

The prediction, which was con- 
firmed, was that a _ reversal shift 
would occur at a more rapid rate than 
a nonreversal shift. This prediction 
was generated by a formulation 
which assumed that card-sorting be- 
havior on any one trial consisted of a 
sequence of two successive S-R as- 
sociations. According to this formu- 

' This study was supported by the Office of 
Naval Research. Reproduction in whole or in 


part of this article is permitted for any purpose 
of the United States Government. 


lation, the stimulus component of the 
first association would represent the 
test cards, while the response would 
refer to implicit symbolic response 
made to them. The stimulus of the 
second association would represent 
the cue produced by the preceding 
implicit response while the response 
would be the overt card-sorting be- 
havior. 

In the Kendler and D’Amato study 
the card-sorting test consisted of only 
two sorting categories. In such a 
situation the S who receives a reversal 
shift is forced to be correct once he 
abandons his previous mode of re- 
sponding. If the appropriate sym- 
bolic cue persisted until this new 
sorting response was made, it would 
then become rapidly associated with 
the new correct sorting response and 
thereby hasten the transition to the 
correct pattern of card sorting for the 
second concept. When compared to 
the reversal Ss, the. nonreversal Ss 
should be retarded in their acquisition 
of their second concept because they 
would have to adopt a new implicit 
response which would serve as the cue 
for their new card-sorting responses. 

What would happen if, instead of 
having a test involving two sorting 
categories, a test were used with four 
sorting categories? The reversal 
group would at the time of the shift 
be making the implicit response which 
would provide the appropriate im- 
plicit cue for the learning of the second 
concept. In such a situation, how- 
ever, once the S abandoned his prev- 
ious mode of responding he would 
not necessarily be correct. There 





REVERSAL AND NONREVERSAL SHIFTS IN CARD SORTING 


would be three possible sorting re- 
sponses that S could make as con- 
trasted with only one possible sorting 
response in a two-sorting category 
test. If a reversal S would not 
initially respond in a reversal manner, 
his appropriate implicit response 
would undergo weakening as a func- 
tion of nonreinforcement. The like- 
lihood, therefore, of the appropriate 
implicit cue being present at the time 
the S in a reversal group makes a 
correct sorting response following the 
shift to the second concept would. be 
greater in a card-sorting test involving 
two categories as compared with a 
situation involving four sorting cate- 
gories. 

This analysis would lead to the 
prediction that the effectiveness of a 
reversal shift over a nonreversal shift 
would be greater in a two sorting 
category situation as compared to one 
involving four categories. This paper 


reports the results of an experiment 


designed to test this prediction. 


MetTuop 


Subjects. —The Ss were 46 students (21 males 
and 25 females) from intreductory psychology 
courses at Washington Square College of Arts 
and Sciences of New York University. Six of 
these Ss (1 male and 5 females) were eliminated 
because of their inability to learn the first 
concept. 


T ~~ 
Pa —_ 
2 


STIMULUS CARDS 



































© © ©) 
O ZZ S =| 
| 2 3 4 





























RESPONSE CARDS 


Four stimulus cards and sample 
of four response cards, 


Fic. 1. 


245 


Material.—A concept formation test of the 
card-sorting variety (1) was developed which 
could be transformed into either a test involving 
two or four sorting categories. Figure 1 shows 
the four stimulus cards and an example of four 
of the 16 different response cards. The stimulus 
cards were designed differently from the response 
cards in order to prevent the sorting of any card 
on the basis of physical identity. Both types 
of cards contained two stimulus clements, one 
above the other. The response cards could be 
sorted in relation to the stimulus cards either in 
terms of similarity between the top or bottom 
stimulus elements. 

In the two-sorting category situation only 
Stimulus Cards 1 and 2 were The two 
concepts used in this test were Horizontal 
Vertical (HV) and Straight-Oblique (SO). The 
HV concept required the response cards with 
radii either at the 12 or 6 o'clock position to be 
sorted under the stimulus card with the vertical 
arrow and circles with radii either at the 3 or 9 
o’clock positions to be sorted under the stimulus 
card with the horizontal arrow. According to 
this principle Response Cards 1 and 2 would be 
sorted below Stimulus Card 1 while Response 
Cards 3 and 4 would belong below Stimulus Card 
2. The SO concept required the response cards 
with lines parallel to two of the sides of the 
square, as is the case for Response Cards | and 
4, to be sorted below the straight line of Stimulus 
Card 2, while response cards with oblique lines 
in the squares (Response Cards 2 and 3) were 
required to be placed below the stimulus card 
with the oblique line (Stimulus Card 1). 

The two concepts used in the test involving 
four sorting categories were Radii-Position (RP) 
and Line-Angularity (LA). The RP concept 
required the radii of the circles to be sorted under 
the arrows on the stimulus cards with the same 
clock position, ¢.g., Response Card | below 
Stimulus Card 1, Response Card 2 below Stimu 
lus card 3, Response Card 3 below Stimulus Card 
4. The LA concept required the lines of the 
squares to be sorted under the lines on the 
stimulus cards with the same angularity, e¢.¢., 
Response Card 1 below Stimulus Card 3, Re 
sponse Card 2 below Stimulus Card 1, Response 
Card 3 below Stimulus Card 4. 

There were a total of 16 response cards, one 
for each of the possible combinations of the two 
stimulus elements. 


used 


The experimental procedure also necessitated 
the learning of the reverse of the above concepts. 
This was accomplished in both of the two- and 
four-category tests by requiring Ss to sort the 
cards in a manner opposite to that required to 
learn the direct concepts, as described above. 
For example, in learning the Reverse Horizontal- 
Vertical concept (RHV) in the two-category 





246 


situation, Response Cards 1 and 2 would be 
correctly sorted below Stimulus Card 2 because 
the radii at the 3 and 9 o'clock positions are 
opposite to the 6 to 12 o'clock plane of the hori- 
zontal arrow in Stimulus Card 2. In the four- 
category situation the Reverse Radii-Position 
concept (RRP) would require Response Card 1 
to be sorted below Stimulus Card 3 and Re- 
sponse Card 2 below Stimulus Card 1. 

Design.—The purpose of this study was to 
determine if the superiority of a reversal shift 
over a nonreversal shift decreases as the number 
of sorting categories increases. Thus a 2 X 2 
factorial design was used for both the two and 
four sorting categories situations. 

With two sorting categories half the Ss 
learned initially the Horizontal-Vertical concept 
(HV) and the other half learned the Straight- 
Oblique concept (SO); then half of the Ss in 
each of these groups learned the Reverse 
Horizontal-Vertical concept (RHV) and the 
other half of each group learned the Reverse 
Straight-Oblique concept (RSO). Thus four 
groups were formed: HV-RHV, HV-RSO, SO- 
RHV, and SO-RSO. The first and last groups 
are examples of a reversal shift while the other 
two involve a nonreversal shift. 

With four sorting categories half the Ss 
learned initially the Radii-Position concept (RP) 
and the other half learned the Line-Angularity 
concept (LA). ‘Then half the Ss in each of these 
groups learned the Reverse Radii-Position con- 
cept (RRP) and the other half of each group 
learned the Reverse Line-Angularity concept 
(RLA). ‘The four groups formed were: RP- 
RRP, RP-RLA, LA-RRP, and LA-RLA, with 
again the first and last groups representing a 
reversal shift, while the other two represent a 
nonreversal shift. 

This experimental design involved the learn- 
ing initially of the HV, SO, RP, or LA concept. 
The Ss were assigned randomly to these four 
groups. The criterion of learning any of these 
four concepts was 15 consecutively correct 
sorting responses. After reaching the criterion 
of learning, each group was divided into half 
with subgroups matched on trials taken to learn 
the first concept. The second concept was 
learned immediately after the first concept and 
it too had a criterion of 15 consecutively correct 
sorting responses, 

Procedure.—The stimulus cards were placed 
in holders on a display panel below which small 
crosses appeared, which S was to tap, in order 
to indicate his sorting choice. The S was read 
the following instructions by £ at the beginning 
of the experiment. 

“You will be shown a series of cards one at a 
time. Some of these cards will belong with this 
(E points to the first stimulus card) card, and 


HOWARD H. KENDLER AND MARK S. MAYZNER, JR. 


others with this one (EZ points to the second 
stimulus card). You must decide, for each of 
the cards that I shall show you with which of 
these 2 (or 4) cards [EZ points to stimulus cards] 
it belongs. Indicate your choice by taking this 
pointer and gently tapping the mark which lies 
beneath the card of your choice. If your choice 
is correct, I shall say ‘Right,’ but if it is incorrect, 
I shall say ‘Wrong.’ Your object is to get as 
many as possible right. Do you have any 
questions ?” 

The transition from the first to the second 
concept was made without informing the S of 
any change in the scoring procedure. The Ss 
who failed to learn the first concept after the 
pack of 16 response cards had been presented 10 
times were eliminated from the experiment. 
The Ss who failed to learn the second concept 
within 160 trials were assigned a score of 160. 
The response deck was presented in an order 
that was designed to preclude 


sequential 
learning. 


Resutts AND Discussion 


The response measure used during 
the learning of the first and second 
concepts was the number of trials 
(excluding the 15 criterion trials) 
required to achieve the performance 
criterion. Table | presents the re- 
sults of the learning of the first con- 
cept. The U test was used to 
compare the differences because of 
the marked skewness of the scores. 
The FP values were doubled to be 
consistent with a two-tailed test. 
The differences between the two 
groups of either the two- or four-sorting 
category situation were not significant 


TABLE 1 


Mean, Mepian, anp Rance or Number 
or Trias To Learn First Concerr 





Number 
of 


Sorting 
Cate- 
gories 


Group Median| Range 


2 HV 
SO C-48 


0-131 


RP 0-40 
LA 0-87 


























REVERSAL AND NONREVERSAL SHIFTS IN CARD SORTING 


TABLE 2 


Mean, Mepian, anp Rance or Numper 
or Trias TO Learn Seconp Concert 
ember | 

of 


Me 


Sorting |Group dian 


Cate- 
gories 





10-100 

















suggesting that the initial concepts 
were of equal difficulty. 

The major results of this experi- 
ment are the scores associated with 
the learning of the second concept. 
The results of the two reversal and 
two nonreversal groups with equi- 
valent number of sorting categories 
were combined. This was justified 


because these groups had not differed 
significantly in their learning of the 
first or second concepts in spite of 


the fact that both their first and 
second concepts were different. Table 
2 reports the results of the learning 
of the second concept. These results 
are consistent with our prediction 
that as the number of sorting cate- 
gories increases, the relative super- 
iority of a reversal shift over a non- 
reversal shift would decrease. In 
order to evaluate the significance of 
this interaction effect by an analysis 
of variance technique, the raw scores 
were subjected to a fourth-root trans- 
formation which resulted in equal 
variances among the groups. The 
analysis of variance as indicated in 
Table 3 reports a significant inter- 
action effect, thus confirming 
prediction. 


our 


This significant interaction must be 
analyzed with respect to both variables, 
i.e., the nature of the shift and the 
number of sorting categories. The com- 
parisons between the means of the 


247 


transformed scores of the reversal and 
nonreversal groups in both the two- and 
four-category situations produced similar 
results to those presented in Table 2, 
i.c., significant at the .01 level in the 
two-category situation and not sig- 
nificant for the four-category situation. 
In terms of the number of sorting cate- 
gories, we found that a reversal shift, 
using the transformed scores, was faster 
(P = .02) in a two-category situation 
(Mean = 1.35) than it was in a four- 
category situation (Mean = 2.24) and 
that there was no difference (P = .20) 
between the mean of the nonreversal 
groups in a two- (Mean = 2.97) and a 
four-category situation (Mean = 2,45). 
It should be mentioned that even if this 
comparison were made with the raw 
scores by the use of the U test, the dif- 
ference between a nonreversal shift in a 
two- and four-category situation would 
not be significant (P = .12). 

This last analysis of the effects of each 
of the two factors individually would 
suggest that the reason the relative 
superiority of a reversal shift over a 
nonreversal shift decreases as the number 
of sorting categories increases is largely 
due to the significantly poorer perform- 
ance of the reversal group in the four- 
category situation as compared to the 
two-category situation. This of course 
would be in keeping with our hypothesis. 

In spite of these consistent findings, 
some reservations about our hypothesis 
should be mentioned. As it now stands, 
it is sort of a James-Lange theory of 
problem solving; one makes the. overt 


TABLE 3 


Awatysis or Variance or 4ru-Roor 
TRANSFORMATION OF NUMBER OF 
Trtats to Leagan Seconp 
Concert 


Sum of 
Squares 


Source 

2-4 sorting 

categories 
R-NR groups 

Interaction 
Within (error) 

















248 


correct sorting response and if the ap- 
propriate symbolic cue is present, then 
problem solution will occur. The factors 
responsible for the occurrence of the 
correcting sorting response prior to its 
being associated with the appropriate 
implicit cue would certainly need to be 
considered in any formulation of concept- 
formation behavior in a card-sorting 
situation. In order to do this, the 
present sequential formulation involving 
two successive S-R associations would 
have to be expanded. 


SUMMARY 


A mediational S-R formulation of concept- 
formation behavior was tested by comparing the 
relative speeds of a reversal and a nonreversal 
shift in a card-sorting test involving two or four 
sorting categories. ‘The prediction was made 


HOWARD H. KENDLER AND MARK S. MAYZNER, JR. 


that the relative effectiveness of a reversal shift 
over a nonreversal shift would be greater in a 
card-sorting test involving two sorting cate- 
gories as compared with one involving four 
sorting categories. ‘The results were consistent 
with this prediction in showing that a reversal 
shift was significantly faster than a nonreversal 
shift in a two-category situation but that there 
was no difference between two types of shifts in 
a four-category situation. 


REFERENCES 


1. Berc, E. A. A simple objective technique 
for measuring flexibility in thinking. /. 
gen. Psychol., 1948, 39, 15-22. 

2. Kenpter, H. H., & D’Amato, M. F. A 
comparison of reversal shifts and non- 
reversal shifts in human concept-forma- 
tion behavior. J. exp. Psychol., 1955, 
49, 165-174. 


(Received May 3, 1955) 





Journal of Experimental Psychology 
Vol. 51, No. 4, 1956 


PREDICTION OF RESPONSE IN VERBAL HABIT 
HIERARCHIES ! 


LLOYD R. PETERSON 
Indiana University 


The word-association technique as 
generally employed measures _re- 
sponses acquired by subjects under a 
variety of unknown conditions of 
learning. The experiment to be de- 
scribed was designed to explore the 
possibilities of experimentally estab- 
lishing intraverbal response proba- 
bilities. The strength of responses 
with a known reinforcement history 
would then be measured by a word- 
association technique. 

In introducing the concept of the 
habit hierarchy, Hull (6) stated that 
when a given stimulus was followed 
by a different and incompatible re- 
sponse in each of several stimulus 
contexts, a hierarchy of responses 
would become conditioned to this 
stimulus. On occurrence of the 
stimulus in a new context, a com- 
petition would result which would be 
won by the strongest of the excitatory 
tendencies. Following this descrip- 
tion the present experiment used four 
pairs of nonsense syllables as stimulus 
contexts and four common English 
nouns as responses. The Ss were 
conditioned to respond verbally by 
the method of paired associates in a 
modified memory drum. One mem- 
ber of each pair of nonsense syllables 
appeared in two contexts and was 
thus conditioned to two different 
responses. Since one context oc- 
curred twice as often as the other 
during acquisition, the responses in 
the resulting hierarchy were of dif- 
ferential strength. Two hierarchies 
were established simultaneously, fol- 

1The major findings of this experiment were 


reported at the 1955 meeting of the Midwestern 
Psychological Association. 


lowing which each nonsense syllable 
was presented singly after the manner 
of a word-association test. 

The experiment lends itself to 
prediction by the statistical learning 
theory developed by Estes (2, 3, 4). 
It is essentially a double discrimi- 
nation learning problem under cor- 
rection procedures in which some 
cues are reinforced differentially while 
others are irrelevant. Since emission 
of the correct response, either aloud 
or subvocally, terminated each trial, 
after a long series of trials the proba- 
bility of response would be expected 
to approximate the probability of 
reinforcement. This would be 1.00 
for responses to unambiguous non- 
sense syllables, .67 for high-frequency 
hierarchical responses, and .33 for 
low-frequency hierarchical responses. 

Due largely to changed instructions, 
however, there is a_ generalization 
decrement from acquisition to test 
conditions, which should be taken 
into account in any prediction. Two 
stimulus sets are present in the test 
situation: S,, representing stimu- 
lation present during acquisition, and 
S», stimulation not previously en- 
countered. Let N, and N; represent 
the number of stimulus elements in 
S, and S, respectively, with N being 
the total number of e'ements in the 
situation. Let 6, and 6, represent the 
probabilities that any element in S, 
or S; is sampled on a given trial, with 
the mean sampling ratio over the 
entire set, 


P 1 
6= —(NYO N#.). ] 
N 19; + Ns) (1) 





250 


Let F; be the proportion of elements 
in S, connected to a given response 
A, at the beginning of test trials, and 
let F, be the proportion of elements in 
S, connected to 4;. Then the proba- 
bility of response 4, occurring in the 
test situation is the weighted mean of 
these proportions: 


p(Ai) = apt NF + N#F:). (2) 


If now we let NA,/N0 = 7, it follows 
from Equation | that N,0,/N0= 1 — . 
By substitution and rearrangement 
Equation 2 becomes 


(41) = Fi — y(Fi — Fi). (3) 


The constant y may be considered an 
index of dissimilarity. When N, = 0, 
the test situation being identical to 
acquisition, y = O and p(A4;) = F;. 
When N, = N, the situation being 
totally different from acquisition, 
y = land p(4;) = F;. Equation 3 
is an expression for stimulus generali- 
zation derived from statistical learning 
theory. In the present experiment 
F, = wand F, = .25, the chance level 
of response when four alternative 
responses are possible. The constant 
7 can be estimated from the responses 
to unambiguous nonsense syllables in 
order to predict the frequencies of 
responses to the ambiguous syllables. 


Metuop 


Subjects.—The Ss were 43 student volunteers 
from the author’s introductory psychology 
courses at Indiana University Gary Center. 
The criterion of learning was not met by 11 of 
these and data from 32 Ss are herein presented. 

Apparatus.—The experiment was conducted 
in a classroom, with an electric fan being run 
during all phases of the experiment to mask 
extraneous sound. The S was seated before a 
black screen, 54 ft. high, in which two § by 2-in. 
apertures had been cut at eye level, On 2's 
side of the screen a 65-in. continuous belt of 
paper was extended over two 5-in. cylinders. 
Manual turning of one of the cylinders permitted 
typed words to appear in the aperture not closed 


LLOYD R. PETERSON 


TABLE 1 
Learninc MarTeriats 





Responses 


foot 
king 
salt 
moon 


DAX CEF 
JID WUB 
JID LAJ 





by a shutter at a given time. Use of two 
apertures permitted the test words to be typed 
in a second column beside the materials exposed 
during acquisition. A .l-sec. clock was used to 
time responses during test trials. 

Learning materials—The learning materials 
are shown in Table 1. The nonsense syllables 
were of 0% association value (5). The nouns 
were (a) among those words found by Thorndike 
and Lorge (8) to occur with a frequency of 100 
times or more per million words in various 
samples of written language, and (b) among the 
stimulus words in the Minnesota norms for the 
Kent-Rosanoff word association test (7), but 
were not among the responses to these words in 
their sample. 

Procedure—The Ss were instructed to 
attempt to anticipate the response word during 
the 2-sec. presentation of the pair of nonsense 
syllables, Then both stimulus and response 
items appeared together for 2 sec. The nonsense 
syllables were typed in black, while the English 
nouns were typed in red. It may be seen from 
Table 1 that one nonsense syllable in each pair 
was unique to a given response, while the other 
nonsense syllable was associated with two re- 
sponses. The apparatus permitted seven blocks 
of six trials each to be run before the sequence 
repeated. In each block of six trials the two 
high-frequency responses appeared twice while 
the low-frequency responses appeared once. 
Varying sequential orders as well as random 
left-right orders were used, with the restriction 
that a given pair was not presented more than 
twice in succession. For half the Ss moon and 
king were the high-frequency responses, while 
for the others salt and foot were the high- 
frequency responses. 

Learning proceeded to a criterion of correct 
anticipation for three successive blocks of six 
trials. Rest periods of 20 sec. were given after 
84 and 168 trials. The Ss who did not meet the 
criterion within 252 trials were eliminated. 

The following instructions were read 20 sec. 
after the criterion had been met: 

“You will now see a series of single nonsense 
syllables appear one by one in the other window, 
separated for a moment by blank paper. Your 
task is to respond to each of these nonsense 





RESPONSE IN VERBAL HABIT HIERARCHIES 


syllables with one of the English words which 
you have just learned. Do not hesitate, but 
speak aloud the first of the words that comes to 
your tongue. The words will follow one another 
without any indication of a right or a wrong 
response.” 

Each syllable was then exposed in the 
aperture of the apparatus until S made a re- 
sponse. Blank paper then appeared in the 
aperture for about 12 sec., followed by the 
appearance of the next stimulus syllable. The 
test series was arranged so that the same 
response theoretically would not be evoked 
twice in succession. An ambiguous nonsense 
syllable was never preceded by an unambiguous 
stimulus for either of its associated responses. 
Two sequences were planned for each of the two 
acquisition arrangements varying the order of 
the ambiguous syllables. An ambiguous syl- 
lable never appeared as the first syllable in the 
test series. The dual purposes behind the 
sequences in the test series were (a) to use one 
of the numerically expendable unique nonsense 
syliables as a practice trial, since an exploratory 
experiment had shown a higher proportion of 
errors on this first trial than on the others, and 
(b) to ensure as high a degree of independence 
as possible among responses. 


Resutts anp Discussion 


The expected probabilities of re- 
sponse to ambiguous syllables in the 
second column of Table 2 are based 
on a ¥ value of .12, which was com- 
puted from response frequencies to 
the unambiguous nonsense syllables, 
excluding the first in the series. The 
observed relative frequencies of re- 
sponse to ambiguous syllables are not 
significantly different from the’ ex- 
pected values on ¢ tests. 

As a test of the significance of 


TABLE 2 


Prepicrep AND Onservep Responses 
on Test Triats 


Observed 


DAX JID 


251 


differences between the high-fre- 
quency hierarchical responses and the 
responses to unambiguous nonsense 
syllables, the statistic Q was em- 
ployed. @Q was devised by Cochran 
(1) for the comparison of two or more 
percentages in matched samples. 
The obtained Q was 22.0. Since 0 
follows the chi-square distribution, 
in this case with 5 df, the differences 
are significant at the .OO1 level. 
Thus, contrary to Hull’s early pre- 
diction, the stronger member of a 
hierarchy did not completely domi- 
nate the weaker member to the extent 
of excluding the latter's occurrence. 
Presumably, Hull would have used 
behavioral oscillation to explain this 
finding. 

The geometric mean latencies in 
test trials were 1.8, 1.8, and 2.1 for 
responses to unambiguous stimuli, 
high-frequency hierarchical responses, 
and low-frequency hierarchical re- 
sponses, respectively. After conver- 
sion of response times to logarithms 
because of positive skew, ¢ tests were 
run between high- and low-frequency 
hierarchical responses as well as 
between responses to ambiguous as 
against unambiguous No 
significant differences found. 
The model used does not predict 
differences in the present situation. 
In experiments in 


stimuli. 
were 


which latencies 
have been used as a measure of re- 
sponse strength, the time to the 
termination of a trial by a single 
designated response 
been used (2). 


has generally 


It may be asked if there are any factors 
in acquisition upon which response in 
test trials is dependent other than proba 
bility of reinforcement? Can it be 
shown, for instance, that an individual 
responds in test trials with the word 
first associated with the ambiguous 
syllable in acquisition? This can be 
tested by computing the probability of 





252 


a high-frequency response given that 
this response was learned first. Simi- 
larly the probability of a high-frequency 
response, given that the low-frequency 
response was learned first, can be ob- 
tained. If the two probabilities are 
equal to the over-all probability of a 
high-frequency response, and thus to 
each other, then occurrence of the high- 
frequency response is independent of 
which response was learned first. Using 
as a criterion of which response was 
learned first the observation of which of 
a set of hierarchical responses was last 
in error during acquisition, the proba- 
bility of a high-frequency response given 
that this response was learned first is .52. 
The probability of a high-frequency 
response, given that the low-frequency 
response was learned first, is .63. Since 
the over-all probability of a high- 
frequency response is .59, it may be 
concluded that the dependency in ques- 
tion has not been demonstrated. 


SUMMARY 


Two verbal response hierarchies were experi 
mentally established using a paired-associate 


technique with a modified memory drum, A 
word-association test then showed that occur- 
rence of responses in the hierarchies approxi- 
mated predictions based on a statistical theory 


of learning. The feasibility of experimentally 
establishing intraverbal response probabilities 


LLOYD R. PETERSON 


for the investigation of verbal behavior problems 
was demonstrated. 


REFERENCES 


1. Cocuran, W. G. The comparison of per- 
centages in matched samples. Bio- 
metrika, 1950, 37, 256-266. 

. Estes, W. K. Toward a statistical theory 
of learning. Psychol. Rev., 1950, 57, 
94-107. 

. Estes, W. K., & Burxe, C. J. A theory of 
stimulus variability in learning. Psychol. 
Rev., 1953, 60, 276-286. 

. Estes, W. K., & Srraucuan, J. H. Analysis 
of a verbal conditioning situation in terms 
of statistical learning theory. J. exp. 
Psychol., 1954, 47, 225-234. 

. Graze, J. A. The association 
nonsense syllables. 
1928, 35, 255-267. 

. Hutt, C. L. Simple trial and error learning: 
a study in psychological theory. Psychol. 
Reo., 1930, 37, 241-256, 

. Russert, W. A., & Jenxins, J. J. The 
complete Minnesota Norms for responses 
to 100 words from the Kent-Rosanoff 
word association test. Studies on the 
role of language in behavior, Univ. of 
Minnesota, Tech. Rep. No. 11, August, 
1954. 

. Tuornvixe, EF. L., & Lorce, 1. The teacher's 
word book of 30,000 words. New York: 
Bureau of Publications, Teachers College 
Columbia Univ., 1944. 


value of 


J. genet. Psychol., 


(Received May 16, 1955) 





Journal of Experimental Psychology 
Vol. S51, No. 4, 1956 


EFFECT OF ANXIETY, MOTIVATIONAL INSTRUCTIONS, 
AND FAILURE ON SERIAL LEARNING 


IRWIN SARASON ! 
Indiana University 


The present study deals with the 
effects of one individual difference 
variable, anxiety, and two instruc- 
tional variables on performance in a 
serial learning situation. Anxiety was 
defined as score on the Taylor Anxiety 
Scale. The instructional variables 
were high and low motivating in- 
structions and failure and nonfailure 
reports. 

A review of studies in the literature 
on verbally induced motivation re- 
veals no clear-cut agreement in results 
with respect to the effects of moti- 
vation on performance. Alper (1) 
studied the effects of high and low 
motivating instructions on the memo- 
rization of nonsense syllables and 
numbers, and found no significant 
differences due to these instructions 
immediately after their administra- 
tion. However, on a test of retention 
24 hr. later the highly motivated 
groups were significantly superior to 
the low-motivated groups.? But a 
somewhat similar procedure used by 
Russell (12) failed to obtain sig- 
nificant differences due to motivating 
instructions either in original learning 
or in retention 24 hr. later. Russell 
did find that high motivating in- 
structions resulted in significantly 


' This article is based on a thesis submitted 
to the Department of Psychology of Indiana 
University in partial fulfillment of the require- 
ments for the degree of Doctor of Philosophy. 
The writer is indebted to the chairman of his 
doctoral committee, Dr. Harry G. Yamaguchi, 
for his advice and assistance throughout the 
study. Dr. William K. Estes contributed many 
helpful suggestions in the design of the experi- 
ment and preparation of the manuscript. 

21. G. Sarason and B. R. Sarason, in an un- 
published study, confirmed this finding. 


253 


greater variability of performance 
than did low motivating instructions. 

Studies on the effects of experi- 
mentally induced failure in general 
show deterioration in performance to 
result from failure. However, studies 
have been reported in which failure 
has resulted in improvement in per- 
formance and in increases in vari- 
ability of performance (7, 8). Lucas 
(9) found failure reports to have 
opposite effects for groups of Ss 
selected from extremes of the Taylor 
Anxiety Scale. 

The present study was performed 
with two main objectives in mind. 
The first objective was to determine 
whether manipulation of an individual 
difference variable, anxiety, could 
shed additional light on the effects of 
motivational and failure instructions. 
In this case the question was: Do 
the effects of verbal instructions on 
performance depend on the degree of 
anxiety of Ss? 

A second purpose of the present 
experiment was to shed light on the 
form of the relationship between levels 
of anxiety and verbal learning per- 
formance. The bulk of the research 
using the Taylor Scale has employed 
Ss selected from extremes of the 
distribution of Taylor scores. Mon- 
tague (11), has reported 
data which suggest that the Taylor 
Scale may discriminate only extremely 
high-anxious individuals from the rest 
of the distribution of Taylor scores. 
Spence and Taylor (13) have reported 
some evidence which points to a 
linear relation between anxiety and 
rate of eyelid conditioning, and other 


however, 





254 


data which suggest a curvilinear 
relation. It was hoped that the 
inclusion of a group of Ss with Taylor 
scores at about the median of the 
total distribution would shed light on 
the form of the relationship between 
anxiety and serial learning. 


Metuop 


Experimental Design 


A three-factor design was employed, with 
two levels of two variables and three levels of 
the third. The levels of the three variables 
were: high, middle, and low anxiety; high- and 
low-motivation instructions; neutral and failure 
reports. 


Subjects 


The Ss were 180 students in introductory 
psychology classes at Indiana University. 
There were 99 males and 81 females. They 
were divided into 12 groups, representing all 
combinations of levels of anxiety, high- and low- 
motivation instructions, and failure and non- 
failure. An equal number (N = 15) of high-, 
middle-, and low-anxious Ss were randomly 
assigned to each experimental group. The 
high- and low-anxious Ss were selected from 
students scoring in the upper and lower 10% 
of the distribution of Taylor Anxiety Scale 
scores for all students in these classes during the 
fall semester of 1954. The middle-anxious Ss 
were drawn from those students who scored 
between the 45th and 55th percentiles in this 
administration of the Taylor Scale. Following 
the procedure suggested by Taylor (14), the 
Biographical Inventory including the 50 anxiety 
items was employed. The mean for the ob- 
tained distribution of 1175 scores was 16.97. 
Both the mean and frequency distribution ob- 
tained from this sample in the present experi- 
ment indicated a small difference between 
Taylor’s normative group and the present group, 
with the latter exhibiting greater frequencies of 
high scores. 

High-anxious Ss used in the experiment 
ranged in Taylor score from 29 to 41, low- 
anxious Ss ranged from 2 to 7, and middle- 
anxious Ss had scores of 15 or 16. Prior to the 
experiment it was decided that high-, middle-, 
and low-anxious Ss with 7’ scores of 70 or more 
on the L or K scales of the MMPI would be 
discarded. None of the Ss had T scores as 
great as 70 on the L scale; 24 high-K Ss were 
discarded. All of these high-K Ss were from the 
low-anxious group. 

The Ss were not informed of any connection 


IRWIN SARASON 


between the group-administered anxiety ques- 
tionnaire and their participation in the memory 
experiment. All Ss were naive with respect to 
verbal learning experimentation. 


Procedure 


Using a Hull-type memory drum and the 
anticipation method, two lists of 17 nonsense 
syllables each were presented to Ss with a 2-sec. 
exposure per syllable and a 6-sec. intertrial 
interval. Each list was composed of syllables 
randomly selected from Glaze’s (6) list of 
nonsense syllables of 73% association value, 
with the restriction that syllables appearing in 
one list could not appear in the other. 

The £ was unaware of the anxiety scores of 
the Ss during the experiment. Also, he did 
not known whether or not an S was to be failed 
until just prior to administration of failure or 
nonfailure reports. This was accomplished by 
having a person other than £ handle the assign- 
ment procedure. 

Each S was given the necessary preliminary 
instructions and 15 trials on the first list (List 
A). The preliminary instructions were: 


“You will see nonsense syllables appearing 
in this opening one at a time. After a syllable 
is presented, call out the next one before it 
appears. Of course, the first time through the 
list you won’t be able to anticipate any syllables, 
but after that call out the syllable before it 
appears on the screen. Prior to the beginning 
of each new trial there will be a short rest in 
which you will see blank spaces in the opening. 
Asterisks will indicate that the first syllable in 
the list will appear next. When you see them, 
call out the syllable. Do you understand?” 


Following the 15 trials on List A, Ss in the 
high- and low-anxious groups were randomly 
assigned to receive high- and low-motivation 
instructions. These instructions were as fol- 


lows: 


High motivation —“Do I have your name 
spelled correctly? How old are you? What is 
your major? What class are you in? What is 
your grade point average? Do you expect to 
get a Bachelor's degree? The first list was a 
practice list. Here is another list. This is a 
short-form intelligence test. It involves the 
memorization of nonsense syllables as in ordinary 
verbal learning experiments. However, the list 
you have to learn is one which measures intel- 
ligence and ability to think in abstract terms. 
Pay close attention to each syllable, since each 
one missed lowers your score when it is compared 
with other people of your age.” 


Low motivation.—“Here is another practice 
ilst. I am attempting to investigate certain 





SERIAL LEARNING 


characteristics of nonsense syllables, the main 
one being the relative association value of the 
syllable. What concerns me is not the subject’s 
performance but rather the list characteristics 
uncovered. This is a common procedure in 
memory-drum experiments.” 


After verbal administration of these in- 
structions, all groups received 14 trials on the 
second list (List B), following which Ss in each 
group were randomly assigned to one of two 
conditions: one condition, failure, consisted of 
verbally administering instructions informing S 
that he had failed, and the other condition con- 
sisted of an equal interval of neutral conversation 
with S about campus activities. The failure 
instructions were as follows, with S addressed 
by his first name: 


“You seem to be having trouble with your 
test. Is anything wrong? Can you see the 
syllables clearly? How do you feel? You've 
been doing much worse than the other people 
who have worked on this task. In fact, yours 
is one of the lowest scores I’ve gotten so far, 
and you are one of the few people I’ve had who 
has not reached the college level on this test. 
That’s why I asked if anything was the matter. 
You've only gotten right. Usually people 
get that many right in half the time it’s taken 
you.” 

After the failure and nonfailure instructions, 
each S was given one more trial, the fifteenth, 
on List B. 

Following this, each S was told to return 24 
hr. later. 
more 


At this time, each S was given six 
trials on List B. After this second 
session, all Ss who had been given failure in- 
structions were told that their performance on 
the test of retention had been excellent. 


RESULTS 


Bartlett’s test for homogeneity of 
variance was calculated for all trials 
on which analyses of variance were 


performed. 
approached 


None of the x*’s obtained 
significance. Thus the 
assumption of homogeneity was con- 
sidered justified. 

Before it is possible to evaluate the 
effect of experimental treatments, it 
is necessary to have some measure of 
whether or not the groups in the 
experiment differed significantly in 
learning ability prior to administra- 
tion of the treatments. In order to 
determine this for the present study, 


255 


TABLE 1 


Means anv SD’s ror Totat Correct 
Anticipations on Triats 1-14 
or List B 





Anxiety 


Mean s SD 


11,892 
31.213 
29/540 


74.900 
Medium 76.667 
High 87.167 


32.693 
28.734 
23.078 


Low 








a 2X2 X 3 analysis of variance was 
performed on the summated number 
of correct anticipations on the 15 
trials of List A. All F’s were non- 
significant. ‘Therefore, it is believed 
that any significant results obtained 
on trials of the second list could be 
attributed to the treatments. 

In order to determine the effect of 
the high and low motivating instruc- 
tions on the performance of the 
treatment groups, a 2 X 3 analysis 
of variance was performed on the 
summated number of correct antici- 
pations for Trials 1 through 14 of List 
B. The only significant effect was 
the Anxiety K Motivation (A « M) 
interaction. ‘This interaction was sig- 
nificant beyond the .O0O1 level of 
confidence. The high-anxious, high- 
motivation groups anticipated sig- 
nificantly fewer syllables than did the 
high-anxious, low-motivation groups, 


TABLE 2 


Awatysis or Variance or Totat Correct 
Anticipations on Triats 1-14 
or List B 


Source of Variation P 3 
Anxiety 
Motivation 
A\xM 
Within groups 
Total 


4.450 
1024.081 
7290.511 

852.320 








*P < OO. 





IRWIN SARASON 


TABLE 3 
Means anv SD’s ron Number or Correct Anticipations on Triat 15 or List B 


High Motivation 








Low Motivation 





Failure Neutral Failure Neutral 





Mean SD Mean SD Mean SD 


Mean SD 


Low 
Medium 
High 


7,000 
5.000 
8.800 


4.456 
3.982 
3.707 


10.667 
11.867 
10.667 


4,868 
2615 
2.420 


8.067 
10.333 
6.733 


“3.751 


3.697 
5.007 


9.933 
11.467 
8.267 


3.900 
3.483 
2.865 




















whereas the middle- and low-anxious 
groups which received the _ high- 
motivation instructions were superior 
to middle- and low-anxious groups 
receiving low motivation instructions. 
The means and SD’s of the 6 treat- 
ment groups and the results of the 
analysis of variance for this trial_are 
presented in Tables | and 2. 

A 2X23 analysis of variance 
on the number of correct anticipations 
on the first postfailure trial, Trial 15 
of List B, was performed in order to 
determine the immediate effects of 
failure and neutral reportsto Ss. The 
means and SD’s of the experimental 
groups and the results of the analysis 
of variance are presented in Tables 
3 and 4, respectively. 

That the failure reports resulted in 
marked decrements in performance is 


TABLE 4 


Anatysis or Vaniance or Numper or Correct 
Anticipations on Truat 15 or List B 





Mean F 


Square 


Source of Variation df 


Anxiety 
Motivation 
Failure 


Within groups 


Total 


17.550 
0.800 
358.420 
82.850 
19.857 
77.300 
29.235 
13.962 








25.67°** 
5.94"* 


5.54° 








clearly reflected in the highly sig- 
nificant main effect for failure (P 
< .001). On the other hand, Ss 
given neutral instructions continued 
to perform at about the same level as 
on Trial 14 of List B. As on the 
previous trials, the interaction between 
Anxiety and Motivation was found 
to be highly significant (P < .005), 
with the motivating instructions again 
having opposite effects for the high- 
as compared with low-and middle- 
anxious groups. Also, the Motiva- 
tion X Failure interaction was found 
to be significant (P < .025), with 
high-motivation, failed Ss performing 
at a higher level than low-motivation, 
failed Ss, and high-motivation, non- 
failed Ss performing at a lower level 
than low-motivation, nonfailed Ss. 

To assess the effects of the treat- 
ments 24 hr. after their administra- 
tion, an analysis of variance was 
performed on the number of correct 
anticipations on Trial 16 of List B. 
The only significant effect obtained 
was the Anxiety X Motivation inter- 
action (P < .025). This interaction 
was the same as that found significant 
on both Trials 1-14 and Trial 15 of List 
B. Itis interesting to note the dissipa- 
tion of the effect of failure after 24 hr. 
despite the sizable decrement obtained 
immediately after administration of 
failure reports. 

On Trial 21, the last trial of the 
second day, once again the only 





SERIAL LEARNING 


significant effect was the Anxiety 
< Motivation interaction (P < .005). 
As on the previous trials, the high- 
anxious, high-motivation groups per- 
formed at a lower level than did the 
high-anxious, low-motivation groups. 
On the other hand, low- and middle- 
anxious groups performed better 
under high- than low-motivation in- 
struction. 

In view of the continued sig- 
nificance of the Anxiety K Motiva- 
tion interaction after 24 hr., an 
attempt was made to obtain measures 
of Ss’ performance after a still longer 
interval of time. It was not possible 
to recall all Ss for this purpose. 
It was possible, however, to obtain 
120 Ss (10 per treatment) for a 
measure of their performance approxi- 
mately 39 days after administration 
of the verbal instructions. Seven 
trials (Trials 22-28) were adminis- 
tered at this time. A 22x 3 
analysis of variance on Trial 28 
revealed that the only significant 
effect was the Anxiety XK Motivation 
interaction (P < .025). As prev- 
iously, high-anxious groups performed 
at a higher level under low than under 
high motivation. Low- and middle- 
anxious groups performed better 
under high than under low motivating 
instructions. 


Discussion 


The finding in this experiment of an 
interaction between anxiety and differ- 
ential motivating instructions indicates 
the importance of assessing personality 
differences among experimental groups. 
On all of the trials of the test list (List 
B) on which statistical tests were per- 
formed, high-anxious, high-motivation 
groups performed at a lower level than 
did high-anxious, low-motivation groups. 
The reverse was the case among low- 
and middle-anxious groups, with high- 
motivation instructions resulting in a 
higher level of performance than low- 


257 


motivation instructions. The manipu- 
lation of the anxiety variable has, there- 
fore, produced a result that requires a 
revision of the generalization drawn from 
experiments such as Alper’s (1). Alper 
found that Ss strongly motivated by 
verbal instructions performed at a higher 
level than did low-motivated Ss 24 hours 
after, but not immediately after, ad- 
ministration of these instructions. In 
light of the present results, Alper’s 
suggestion of a greater facilitation of 
performance for high as compared with 
low motivation still may be tenable, dust 
only for low- and middle-anxious Ss as 
defined by scores on the Taylor scale. 
Furthermore, when the anxiety variable 
is manipulated, the differential moti- 
vating instructions, in interaction with 
anxiety, do have an effect both im- 
mediately and 24 hr. after administration 
of the instructions. 

Two possibile theoretical interpreta- 
tions suggest themselves on the basis of 
the findings concerning the interaction 
between anxiety and motivational in- 
structions. One, a drive interpretation, 


stems from Taylor and Spence’s view 


that score on the Taylor scale reflects the 
general drive level of the individual and 
that this drive has the properties of 
Hull’s D (15, 16). In the present 
experiment, high motivating instructions 
were facilitative for middle- and low- 
anxious groups, and detrimental for 
high-anxious groups. It might be hy- 
pothesized that high motivating in- 
structions increased the general drive 
level of Ss, whereas low motivating 
instructions left it unchanged. If this 
were the case, it is possible that up to a 
certain point increases in motivational 
level are facilitative for performance 
and beyond this point increases in 
motivational level are detrimental. 
Another hypothesis places greater 
emphasis on associative factors. Such 
an hypothesis would consider high- 
anxious Ss as having learned certain 
detrimental responses to situations simi- 
lar to that with which Ss were confronted 
in the present experiment. For example, 
high-anxious Ss might place a high 
premium on excellence of performance in 





258 


situations in which their behavior is 
being evaluated (e.g., taking an intel- 
ligence test) and might verbalize to 
themselves during their performance 
about the importance of a high level of 
attainment. Quite conceivably such 
verbalizations could have an interfering 
and detrimental effect on their per- 
formance. For low-anxious Ss on the 
other hand, high motivating instructions 
may act as a stimulus for increased 
effort in the performance of the task 
with which they are confronted. This 
habit hypothesis is similar to interpre- 
tations recently offered by Mandler and 
Sarason (10), and Child (2). 

The present experiment failed to show 
significant differences between the three 
anxiety groups on the preliminary list 
(List A). Introduction of the moti- 
vational instructions produced virtually 
identical effects on performance for the 
low- and middle-anxious Ss. Perform- 
ance of low-and middle-anxious groups 
given high motivation was superior to 
those given low motivation instructions, 
whereas high-anxious groups performed 
better under low than under high moti- 
vation instructions. In considering the 
implications of these results, two factors 
appear to be of considerable importance. 
One pertains to the sensitivity of the 
Taylor Scale as a measure of a drive 
similar to Hull’s D; the other concerns 
the effect of certain task variables on the 
performance of Ss differing in anxiety 
level. 

With regard to the sensitivity of the 
Taylor Scale, the significant Anxiety 
X Motivation interaction and the failure 
to obtain differences between low- and 
middle-anxious groups furnish some evi- 
dence for the conclusion that only the 
highest anxiety scores delineate a group 
distinguishable from the rest of the Ss 
in the distribution. The Taylor Scale 
may dichotomize Ss rather than order 
them, at least with respect to perform- 
ance in a serial learning situation. If 
this is true, the validity of one assump- 
tion underlying much current research 
with the Taylor Scale would appear to 
be questionable. It is quite possible, 
however,’that’ the scale is sensitive only 


IRWIN SARASON 


in the upper region of the score distri- 
bution, i.e., increases in Taylor scores 
may reflect differences in D only at the 
higher end of the distribution of Taylor 
scores. To test this possibility, it would 
be necessary to compare the performance 
of Ss scoring, for example, at the 70th, 
80th, and 90th percentile on the Taylor 
Scale. In this connection, Montague 
(11) has reported a study in which Ss 
scoring between the 80th and 90th 
percentiles were indistinguishable in 
verbal learning performance from low- 
anxious Ss. 

The role of task variables is of par- 
ticular relevance in considering the lack 
of significance among the three anxiety 
groups on the preliminary list. Farber 
(3) has summarized several studies which 
indicate that the advantage of non- 
anxious over anxious Ss is a positive 
function of the probable number and 
strength of the competing responses. 
This would be the case since, according 
to this theory, increasing the anxiety 
level and consequently the number of 
superthreshold responses, would increase 
the number of errors (16). The long 
list (17 syllables) used in this experiment 
would seem to indicate the presence of a 
large number of competing responses. 
Taking this into consideration, one might 
predict, for a task such as the one posed 
to Ss in this experiment, that the high- 
anxious group would perform more 
poorly than the middle-anxious group, 
and the low-anxious group would perform 
at the highest level of the three groups. 
The F ratio for anxiety of less than unity 
on List A would appear to contradict 
this prediction. It must be remembered, 
however, that the variable of intralist 
similarity has been shown by Montague 
(11) to be of relevance in the performance 
of anxious and nonanxious Ss. Since 
this variable was not manipulated in this 
experiment and the list used was not of 
high similarity when compared with 
Montague’s criteria, the results of the 
preliminary list cannot be regarded as 
definitely opposed to predictions stem- 
ming from the drive theory. 

Whereas anxiety proved to be a rele- 
vant variable with respect to the effect 





SERIAL LEARNING 


of differential motivating instructions, 
the effects of failure were found to be 
independent of anxiety level of Ss. 
Previous findings (4, 12) of a highly 
significant effect of failure immediately 
after administration of failure and non- 
failure instructions, but not after 24 hr., 
were confirmed in the present experi- 
ment. All failed groups in the present 
study, regardless of their Taylor scores, 
showed marked decrements in level of 
performance immediately after failure. 
An Anxiety X Failure interaction, simi- 
lar to that obtained by Lucas (9), was 
not found. It is clear that the failure 
reports administered were quite intense 
deleterious stimuli, and that the apparent 
stressfulness of the stimuli had com- 
pletely dissipated with 24 hr. This 
immediate and temporary effect of 
failure reports is in contrast with the 
relatively long-lasting joint effect of 
anxiety and motivation which was ob- 
tained. In terms of the conception of 
an optimal facilitative drive level, the 
failure reports can be viewed as in- 
creasing the drive level of ail failed 
groups beyond the optimal point. 
Another interesting effect of failure 
concerns the significant interaction be- 
tween the motivating instructions and 
failure reports immediately after failure 
(Trial 15, List B). Among failed Ss, 
high-motivation groups performed at a 
higher level than did low-motivation 
groups, while among nonfailed Ss, high- 
motivation groups performed at a lower 


level than did low-motivation groups. 
This finding does not easily fit any of the 
existing theories of motivation.* 


SUMMARY 


The present experiment, using a 2 X 2 x 3 
analysis of variance design, was concerned with 
the effects of three motivational variables on 
performance in a serial learning situation. The 
levels of the variables employed were (a) high, 
middle, and low anxiety as defined by score on 

*That the result did not arise purely by 
chance is suggested by the similar unpublished 
study by Sarason and Sarason in which the 
identical result was obtained at the .06 level of 
significance. 


259 


the Taylor Anxiety Scale, (b) high and low moti- 
vating instructions administered prior to the 
learning of a test list, and (c) administration of 
failure and nonfailure reports during the course 
of learning. 

Using the anticipation method, the 180 Ss 
were administered 15 trials of a preliminary list, 
following which the differential 
instructions were administered. Immediately 
after this, Ss received 14 trials on the test list, 
following which half the Ss were told they had 
failed, while the others were engaged by E£ in 
neutral conversation. One further trial on the 
test list was administered following the differ- 
ential failure instructions. ‘Twenty-four hours 
later Ss returned for six additional trials on the 
test list. 

The results revealed that high motivational 
instructions were detrimental for high-anxious 
groups and facilitating for low- and middle 
anxious groups. Furthermore, this Anxiety X 
Motivation i 


motivational 


interaction continued to be sig- 
nificant 24 hr. after, and over one month after, 
administration of the motivational instructions. 
The performance of failed Ss was found to be 
significantly poorer than that of nonfailed Ss 
immediately after failure 
reports; however, 24 hr. later the effects of 


failure had completely dissipated. 


administration of 


At no time 
did the high-, middle-, and low-anxious groups 
perform differently on the basis of the anxiety 
variable alone. 

The results were taken as a further verifi- 
cation of the influence of individual differences 
Two 
interpretations of the results were considered: 


(e.g., anxiety) variables on performance. 


one emphasized associative factors involved in 
the learning of certain deleterious responses by 
high-anxious individuals, while the other stressed 
the motivational aspects of anxiety. 


REFERENCES 


1. Atrer, T. C. Task-orientation vs. ego- 
orientation in learning and retention 
Amer. J. Psychol., 1946, 99, 236-248. 

2. Cup, IL. L. Personality. Ann. Reo 
Psychol., 1954, $, 149-171 

3. Fanper, I. FE. The role of motivation in 
verbal learning and performance. Pry 
chol. Bull., 1955, $2, 311-327. 

4. Fanwes, I. F., Russert, W. A., & Annreas, 
B.G. Effect of failure upon performance 
in verbal and motor learning situations. 
A review of work done at the State 
University of Iowa. Paper read at 
Midwest. Psychol. Ass., Chicago, April, 
1949. 





260 


5. Fanper, I. E., & Spence, K. W. Complex 
learning and conditioning as a function 
of anxiety. J. exp. Psychol., 1953, 45, 
120-125. 

6. Graze, J. A. The association value of 
nonsense syllables. J. genet. Psychol., 
1928, 35, 255-267. 

7. Hurvock, EF. The value of praise and 
reproof as incentives for children. Arch. 
Psychol., N. Y., 1924, 11, No. 71. 

8. Lazarus, R. S., & Exixsen, C. W.” Effects 
of failure stress upon skilled performance. 
J. exp. Prychol., 1952, 43, 100-105. 

9. Lucas, J. The interactive effects of anx- 
iety, failure, and intraserial duplication. 
Amer. J. Psychol., 1952, 65, 166-173. 

10. Manpuier, G., & Sarason, S. B. A study 
of anxiety and learning. J. abnorm. soc. 
Psychol., 1952, 47, 166-173, 

11. Montacug, FE. K. The role of anxiety in 
serial rote learning. J. exp. Psychol., 1953, 
45, 91-96. 


IRWIN SARASON 


12. Russert, W. A. Retention of verbal 


material as a function of motivating 
instructions and experimentally induced 
J. exp. Psychol., 1952, 43, 207 


failure. 
216. 

13. Spence, K. W., & Taytor, J. A. The 
relation of conditioned response strength 
to anxiety in normal, neurotic, and 
psychotic subjects. J. exp. Psychol, 
1953, 45, 265-272. 

14. Taytorn, J. A. A_ personality scale of 
manifest anxiety. J. abnorm. soc. Psychol., 
1953, 48, 285-290. 

15. Tayzor, J. A. The relationship of anxiety 
to the conditioned eyelid response. /. 
exp. Psychol., 1951, 41, 81-92. 

16. Tavuor, J. A., & Spence, K. W. The 
relationship of anxiety to level of per- 
formance in serial learning. J. exp. 


Prychol., 1952, 44, 61-64. 
(Received May 3, 1955) 





Journal of Experimental Psychology 
Vol. 51, No. 4, 1956 


TESTS OF TWO THEORIES OF DECISION IN AN 
“EXPANDED JUDGMENT” SITUATION 


FRANCIS W. IRWIN, W. A. S. SMITH, AND JANE F. MAYFIELD! 


University of Pennsyloania 


Some years ago Cartwright and 
Festinger (2, 4, 5) proposed a theory 
of decision and offered evidence in 
support of a considerable number of 
deductions from it. Without strict 
adherence to their terminology, the 
theory may be outlined as follows for 
the case of a psychophysical procedure 
in which the method of constant 
stimuli is used in its two-category 
It is assumed that, when two 
stimuli are presented for comparison, 
competing tendencies are aroused in 
S to make each of the two permissible 
responses (e.g., to say “‘greater’”’ and 
to say “‘less). These two response 
tendencies are supposed to persist for 
some time, during which they fluc- 
tuate in strength in a normal distri- 
bution and in perfect negative cor- 
relation with other other. The 
difference between the two tendencies, 
which will be called the resultant 
tendency, will therefore itself vary 
normally over time. It is further 
assumed that there exists a restraint 
which 
operates against the resultant tend- 
ency likewise 


form. 


against making a _ decision, 


and has a normal 
temporal The theory 
then asserts that a decision will occur 


distribution. 


when and only when, in the random 
temporal pairing of resultants and 
restraints, a value of the resultant is 
greater than the corresponding value 
of the restraint. 


1 We express our appreciation to Mrs. Louise 
Dyckman for assistance in treating the data of 
these experiments. 

2 The various directions taken by “theory of 
decision” are described in the excellent review 


by Edwards (3). 


In applying their theory, Cart- 
wright and Festinger coordinated the 
mean of the distribution of resultants 
to the magnitude of the stimulus 
difference. They also coordinated the 
fluctuation of the resultants to the 
fluctuation of confidence as measured 
by confidence ratings made by S. 
However, they did not suggest how 
the variance of this fluctuation might 
be controlled experimentally. 

This theory two 
consequences have apparently 
gone The first is that, 
other things equal, S’s confidence in 
his judgment should be independent 
of the duration of the decision process, 
since confidence is supposed to vary 


has 
that 
unnoticed. 


interesting 


randomly with time and no tempo- 
rally cumulative 
duced.’ 


variance of 


intro- 
The second is that, if the 
the distribution of re- 
sultants can be varied, no change will 


effects are 


be expected in S’s mean confidence, 
if the mean is calculated algebraically 
with opposite signs given to opposite 
directions of judgment; while S’s 
mean absolute confidence, disregard- 
ing direction, will vary directly with 
the magnitude of the 
the The 
because, increased 


variance of 
resultants. 
with 
of the resultant tendency, a greater 
portion of its distribution will fall 
beyond the region of restraint, and 


latter is true 


variability 


large values (which are coordinated 
to high confidence) will occur more 


of zero confidence, 
Festinger (2, P oll, 
assume “that the subject can give a confidence 
rating which will reflect the average potency for 
the duration of the decision.” 


*For the special case 


Cartwright and note 





262 


frequently, while the reverse holds 
for decreased variability of the re- 
sultant. 

Before we noted these implications 
of the theory, we had been led to 
quite different expectations, namely, 
that confidence would vary directly 
with the duration of a decision process 
and that it would vary inversely with 
the variability of the resultant tend- 
ency, whether means are taken with 
or without regard to direction of 
judgment. ‘These expectations 
emerged from the general notion that 
the psychological processes involved 
in decision-making might have a 
model, at least to some extent, in the 
processes whereby statistical decisions 
are made. If this notion has any 
merit, we would suppose that a 
decision process must involve some 
accumulation of information with the 
passage of time, so that confidence 
would increase as a function of time. 
Again, on these general heuristic 


grounds, we would expect that the 


greater the variability of the re- 
sultants, the less on the average would 
be S’s confidence in his judgment, 
just as a statistical “‘confidence level” 
is inversely related to the variability 
of the data upon which it is based. 
That an increase in variability should 
lead to an increase in mean absolute 
confidence seems so implausible that 
we might even try to argue that 
organisms operating on such a prin- 
ciple would be eliminated by the 
processes of natural selection. 

No feasible means of controlling 
adequately the duration of an ordi- 
nary psychophysical process or the 
variance of the distribution of re- 
sultants in such a process has occurred 
to us. We have therefore proceeded 
by inventing what we shall call an 
“expanded judgment” situation as an 
analog to the psychophysical process. 
The adequacy of this analog may, of 


F. W. IRWIN, W. A. S. SMITH, AND J. F. MAYFIELD 


course, be questioned, but it appears 
to have some interest of its own. 

In the present experiment the ex- 
panded judgment situation involves 
showing S a series of cards upon each 
of which a number is printed. These 
cards are at the top of a pack of 500 
cards, and S is informed (incorrectly) 
that the whole pack has been thor- 
oughly shuffled. His task is to ob- 
serve the numbers as they are shown, 
and to form a judgment as to whether 
the mean of the whole pack is greater 
than, or less than, zero. An estimate 
of zero is not permitted. He must 
also give ratings of his confidence in 
the accuracy of his judgment, using a 
100-point scale. In such a procedure, 
E can control the mean of the numbers 
seen by S, their variability, and the 
number of numbers shown, which we 
take as analogs, respectively, to the 
magnitude of a stimulus difference, 
the variability of the distribution of 
resultants, and the duration of the 
decision process in a conventional 
psychophysical situation. 

Our predictions were, then, (a) that 
mean confidence ratings (computed 
algebraically) would vary directly 
with the mean of the numbers shown 
to S, (b) that the absolute magnitude 
of mean confidence ratings (computed 
with or without regard to direction of 
judgment) would vary inversely with 
the SD of the numbers shown to S, 
and (c) that the absolute magnitude 
of mean confidence ratings (computed 
algebraically) would vary directly 
with the number of numbers shown to 
S. If Cartwright and Festinger were 
to accept our analog, they should 
agree with our first prediction. It 
appears, however, that they should 
predict that the absolute magnitude 
of mean confidence ratings would be 
independent of the SD of the numbers 
shown if computed algebraically and 
directly related to the SD if computed 





TESTS OF TWO THEORIES OF DECISION 


with direction of judgment disre- 
garded, and that confidence would be 
independent of the number of num- 
bers shown. 


Experiment [| 
Method 


Subjects:—The Ss were 48 students in ele- 
mentary psychology at the University of 
Pennsylvania, 24 men (Group M) and 24 
women (Group F). None had had a course in 
statistics or training in testing statistical 
hypotheses. They participated as volunteers 
and received “laboratory credit.” Each S was 
seen singly. 

Materials and procedure.—The E sat across a 
table from S and gave the following instructions: 


“I have here a pack of 500 cards. Each of 
these cards has a number on it. Some of the 
numbers are positive and some are negative. I 
am going to show you a few of these cards, one 
at a time, from the top of the pack. All 500 
cards have been thoroughly shuffled, so that the 
cards I will show you will be whatever ones 
happen by chance to be toward the top of the 
pack. After each card is shown, please write 
down on this record sheet the following things: 
1. The number that was just shown. 2. Your 
best judgment, on the basis of the numbers you 
have seen, as to whether the average of the whole 
set of 500 numbers is greater than zero or less 
than zero, and your confidence in this judgment. 
You will make this judgment and confidence 
rating by referring to this scale. For example, 
if you think the average is greater than zero and 
if you have medium confidence in this judgment, 
you will record a value of plus 50. If you think 
the average is less than zero and if you have 
almost no confidence in your judgment, you will 
record a minus 10 or minus 20 depending on how 
confident you are, and so on. In each case, 
decide first whether you think that the average 
is plus or minus, and then give the most accurate 
confidence rating that you can, using any value 
between zero and 100. Please note that you 
cannot properly use either plus 100 or minus 100 
as confidence ratings, since you cannot be 
absolutely certain of the average of the 500 
numbers from the small sample that you will 
see. Also, we want you not to use a rating of 
wro unless you feel that you are absolutely 
forced to do so. After you have written down 
each number and your judgment and confidence 
rating, please cover up what you have recorded.” 

The S was shown the top 20 cards of each 
pack (the remaining cards being dummies). 
Eight different packs were used with each S and 


263 


two additional packs with the Ss of Group M. 
The top 10 cards had numbers distributed as 
nearly normally as possible for a given mean, 
SD, and class interval of .5; but the SD’s were 
so large that no duplicate numbers occurred 
within any set of 10. The second 10 cards were 
duplicates of the first 10. Each set of 10 cards 
was shuffled by hand separately for each S. 
Thus, different card orders were used for different 
Ss, but all Ss had seen the same cards from a 
given pack at the end of 10 exposures and of 20 
exposures. The means and SD’s of the various 
packs can be seen in Table 1 and will be con- 
joined for identification of the packs; ¢.g., “Pack 
.5, 2.0” refers to the pack with a mean of .5 and 
an SD of 2.0. The distributions of numbers 
may be illustrated by that of Pack .5, 7.5, each 
of whose two sets of 10 cards contained the 
following numbers: 13.5, 9.0, 6.0, 3.5, 1.5, 
— 5, — 2.5, — 5.0, — 8.0, — 12.5. 

The confidence scale alluded to in the in- 
structions was presented in the form of a hori- 
zontal line 10 in. long. Scale divisions from 
— 100 at the left to 100 at the right were marked 
and numbered at 10-point intervals. A large 
plus sign was drawn above the right half of the 
scale and a similar minus sign over the left. The 
words “absolute certainty” were printed beneath 
the ends of the scale, the words “medium con- 
fidence” beneath the plus and minus 50, and the 
words “pure guess” beneath the zero. 

The packs were presented in different orders 
to different Ss so as to achieve approximate 
counterbalancing. Cards were exposed at in- 
tervals of 5to 10 sec. The Ss were not informed 
during the experiment whether or not their 
judgments were correct. 


Results and Discussion 


Table 1 gives the mean confidence 
ratings* for each of the packs for the 


tenth and twentieth cards. ‘The signs 
of the ratings of the packs with 
negative means have been reversed 
so that positive values always repre- 
sent “correct” judgments. As can 
be seen, the differences between 
Groups F and M were not large or 
consistent in direction and do not 
require further discussion. The same 


* Here and throughout the remainder of this 
article, tables and figures included, the mean 
confidence ratings were computed algebraically, 
with opposite directions of judgment given 
opposite signs, unless it is specifically stated 
otherwise. 





F. W. IRWIN, W. A. S. SMITH, AND J. F. MAYFIELD 


TABLE 1 


Mean Conrivence Ratincs or Grours F ann M art tHe 10rn anv 20rn Carns 
rrom Packs with Various Means anv SD’s 


Each mean is based upon 24 ratings. Signs of ratings from packs with negative means are 
reversed, Median SD of distributions of confidence ratings = 25 (see text). 
Numbers in parentheses are frequencies of “incorrect” judgments. 





SD of Pack = 7.5 


10 Cards Seen 


SD of Pack = 2.0 


10 Cards Seen 





20 Cards Seen 20 Cards Seen 


Gp. F 


Gp. M 
30 (2) 
25 (1) 
63 (0) 
68 (0) 
75 (0) 
78 (0) 


19 (1) 10 (5) 
14 (5) 


10 (1) 


35 (O) 
46 (0) 


53 (0) 
53 (0) 


77 (0) 
76 (0) 


“20 (2) 


43 (0) 
46 (2) 


39 (1) 
38 (2) 


43 (0) 
51 (0) 


























was true of the differences between 
packs with positive and negative 
means. All of the mean confidence 
ratings were in the correct direction 
except that for Group M for 10 cards 
from Pack .5, 7.5. Since none of the 
distributions were extremely skewed, 
the means are fairly representative 
values. 


The SD’s of the confidence ratings 
whose means appear in Table 1 
ranged from 17 to 38 with a median 
of 25. In 15 of 18 comparisons the 
SD was greater at Card 20 than at 
Card 10; this discrepancy from sym- 
metrical distribution is reliable be- 
yond the .O1 level by binomial ex- 
pansion. No other clear relation 
appeared between the SD of the 
confidence ratings and the experi- 
mental variables (subject groups, and 
mean, SD, and sign of the distri- 
butions of cards seen). 

Figure 1 presents the mean con- 
fidence ratings in a different form. 
Each plotted point is the mean rating 
over all Ss for two packs that differed 
only in the sign of their means; e.g., 
the lowest point to the left in the 


wien COereEece Gates 





oa 
‘se #0 #8 





cCavcoucattoe & 


Fic. 1. Mean confidence ratings as a function 
of t values for tests of the null hypothesis that 
the mean of a pack is equal to zero, Each 
plotted point is the mean over Groups F and 
M for the two packs with the same (absolute) 
mean and SD (except the points for 1.5, 2.0, 
which were obtained from Group M only). 
Points for 10 and 20 cards from the same packs 
are connected by straight lines; that for 10 cards 
is always the lower of the two. Sets of packs 
are identified by conjunction of their (absolute) 
mean and SD. 


figure is the mean of the four values 
in Table 1 for Groups M and F at 10 
cards from Packs .5, 7.5 and — .5, 7.5. 
The abscissa represents ¢ values calcu- 
lated for tests of the null hypothesis 
of a mean of zero. For graphic 
purposes, ¢ was chosen rather than P 
because it spread the data better for 


5In connection with this treatment of the 
data, cf. Bilodeau (1). 





TESTS OF TWO THEORIES OF DECISION 


inspection, but the reader should note 
that, although the P’s corresponding 
to a given t differ little as between 10 
and 20 cards (i.e., 9 and 19 df), the 
difference would be greater for smaller 
numbers of cards. 


From Table 1 and Fig. 1 it can readily 
be seen that our expectations concerning 
the effects of varying the SD and the 
number of cards seen by S were realized, 
and that those of Cartwright and 
Festinger’s theory (to the extent that 
they are applicable) were contradicted. 
Thus, in all of the 16 pairs of mean 
ratings under conditions varying only 
in SD, the confidence was greater for 
the smaller SD. Similarly, the mean 
confidence was greater at Card 20 than 
at Card 10 in all of the 16 comparisons 
in which number of cards seen was the 
only condition that was varied. It was 
also true without exception, as both 
views would expect, that the mean 
confidence increased with increase in the 
absolute value of the mean of the cards 
seen. These outcomes are consistent 
over sO many comparisons that further 
statistical treatment is unnecessary. We 
regard them as strong support for the 
view that the psychological decision 
processes operating in the present situ- 
ation were affected cumulatively by 
increasing information. 

The mean confidence ratings were 
also computed without regard to sign. 
In 14 of the 16 possible comparisons 
these means were smaller for the greater 
SD, thus supporting our expectation and 
opposing the direct relationship that 
Cartwright and Festinger would appear 
to predict. 

An additional finding, not surprising 
in the light of these results, is that the 
frequency of Ss making “incorrect”’ 
judgments—i.e., judgments with a sign 
opposite to that of the mean of the 
numbers seen—was inversely related to 
the confidence measures. Table 1 gives 
in parentheses the numbers of Ss who 
made such judgments. They can be 
seen to be related inversely to the 
absolute value of the mean: of the cards 
and directly to the SD; this is reflected 


265 


in their inverse relation to onfidence. 
They fail, however, to decrease con- 
sistently with increase in the number of 
cards seen. 


Experiment II* 


In this experiment, the expanded 
judgment situation was set up so as 
to require S to judge from limited 
samples which of two packs of num- 
bered cards had the higher mean. 
The purpose was to attempt to con- 
firm the findings of Exp. I in a more 
complex situation and thus to permit 
broader generalization. At the same 


time, since t and P could be calculated 
for the new situation as well as the 
former one, it was possible to deter- 
mine whether mean confidence ratings 
were similar for similar values of t and 
P in the two different situations. 


Method 


Subjects. —The Ss were 36 students in ele 
mentary psychology at the University of 
Pennsylvania in the summer session of 1954, 
of whom 11 were men and 25 were women. 
One male S had had a course in statistics, but 
he gave no evidence of consciously using sta- 
tistical principles. ‘These Ss were presumably 
drawn from a somewhat different population 
from those of Exp. I, but it would be difficult to 
specify either population precisely. 

Materials and procedure.—The procedure and 
nstructions were kept as nearly as possible like 
those of Fxp. I. The S was confronted by two 
packs of 500 cards and was told to judge which 
of the two whole packs had the higher mean in 
the positive direction on the basis of cards drawn 
from the top of the packs and exposed to him 
simultaneously, Each time a pair of cards was 
exposed, S made a judgment of “right” of “left” 
and a confidence rating. ‘Twenty pairs of cards 
were shown from each pair of packs. The 
confidence scale was like that of the previous 
experiment, except that values to the left of 
zero represented judgments that the left-hand 
pack had the higher mean, while those to the 
right meant that the right-hand pack had the 


*The senior author is solely responsible for 
Exp. II, which was supported by a Summer 
Research Grant from the Committee for the 
Advancement of Research of the University of 
Pennsylvania. 





266 


TABLE 2 


Cuaracteristics or Pairs or Packs Usep 
in Exe. Il 








75 
75 
7S. 
75 
75 | 


7.5 














higher mean. Preliminary practice in identi- 
fying the higher of two numbers in the positive 
direction was given until E was convinced that 
S understood this conception. Each pack of a 
pair was presented equally often to the left and 
to the right. ‘The order of pairs of packs was 
counterbalanced so that each pair appeared 
equally often in each temporal position. As in 
Exp. I, each of the two sets of 10 cards in each 
pack was shuffled separately for each S. 

Six pairs of packs were made up from the 
packs used in Exp. I. Their characteristics are 
given in Table 2, The pairs were selected so 
as to permit the desired comparisons with respect 
to mean differences and SD’s of the differences 
and also to spread the values of t over a wide 
range. 


Results and Discussion 


Table 3 gives the mean confidence 
ratings of the 36 Ss for each of the 
conditions. It must be kept in mind 
that, while the mean difference be- 
tween packs in a pair was strictly 
controlled, the SD’s of the differences 
between pairs of cards varied from 
S to S as a result of the shuffling 
procedure, The SDagirr’s of the 


F. W. IRWIN, W. A. 8. SMITH, AND J. F. MAYFIELD 


headings in Tables 2 and 3 are those 
calculated from the SD’s of the two 
packs of a pair on the assumption of 
zero correlation. This will be dis- 
cussed further below. 

As in the previous experiment, the 
data showed great regularity. In 
every appropriate comparison the 
difference in mean confidence ratings 
was in the direction expected by our 
theory. Thus, the ratings increased 
as the mean difference increased and 
as the number of pairs of cards seen 
increased, and the ratings decreased 
with an increase in the SDajry.. In 
view of the numbers of comparisons 
and the consistency of the outcomes, 
it is unnecessary to make special tests 
of the reliability of the effects of 
varying the mean difference and the 
number of pairs of cards seen. In 
the case of the variation in SDaj¢r, 
however, where there were only four 
comparisons, we have tested the 
significance of each of the differences 
by ¢t tests. The levels of significance 
from these tests were .02 in one 
instance and < .O1 in the other three, 
using both tails of the distribution. 
We may therefore conclude that, as 
in Exp. I, the mean confidence ratings 


TABLE 3 


Mean Conripence Ratines at THe 10rn Anp 
20rn Pairs or Carps From Pairs or 
Pacxs Wuose Dirrerences 
Varirep in Mean anv SD 


Median SD of distributions of confidence 
ratings = 29 (see text). Numbers in 
parentheses are frequencies of 
“incorrect” judgments. 

(N = 36.) 





SDau. = 2.8 SDan. = 106 





10 Card | 20 Card 
Pairs Pairs 
Seen Seen 


30 (5) 
42 (0) | 59 (1) 
52 (0) | 69 (0) 


50 (3) 

















TESTS OF TWO THEORIES OF DECISION 


ee 


es 


1006 
Z bi i 4. 4 d. 4. i. i J 


i i i 
$s +@ +8 #02 88 B80 88 46 68 80 88 60 68 


ian CONPOENCE fat 








ca.comatio ¢t 


Fic. 2. Mean confidence ratings as a func- 
tion of t values for tests of the null hypothesis 
that the difference between the means of two 
packs is equal to zero. Each plotted point is 
the mean over all Ss for a pair of packs with the 
indicated mean difference and SDajyr_ (the latter 
calculated for r = 0). Points for 10 and 20 
pairs of cards from the same pair of packs are 
connected by straight lines; that for 10 pairs of 
cards is always the lower of the two. 


varied directly with the absolute 
value of the mean and with the 
number of cards seen and that they 
varied inversely with the SD. 

When mean confidence ratings were 
computed without regard to sign, the 
means were significantly smaller (P 
< .01) for the greater SD in all of the 
four possible comparisons. 

The SD’s of the distributions of 
confidence ratings whose means ap- 
pear in Table 3 ranged from 22 to 
34, with a median of 29. (The 
median in Exp. I was 25.) They did 
not vary regularly with any of the 
independent variables, although in 
four of the six comparisons, the SD 
for a pair of packs was greater for the 
twentieth pair of packs than for the 
tenth. 

The numbers in parentheses in 
Table 3 show that the number of 
“incorrect” judgments of the sign of 
the mean difference decreased con- 
sistently with increased number of 
pairs of cards seen and with increased 
absolute value of the mean difference, 
and increased consistently with in- 
crease in the SDajr¢;.. The latter two 
findings confirm those of Exp. I. 


267 


In Fig. 2, the mean confidence 
ratings for each condition are plotted 
against the corresponding values of ¢. 
The ¢t’s were calculated as for tests 
of the null hypothesis that the mean 
difference between the two packs in a 
pair was equal to zero, with the as- 
sumption of zero correlation between 
the packs. The comments made con- 
cerning the interpretation of Fig. 1 
(Exp. I) are applicable to Fig. 2 also 

Since the range of ¢ occurring in 
Exp. II includes all but the smallest 
that occurred in Exp. I, one may ask 
whether similar mean _ confidence 
ratings were obtained for similar t's 
in the two experiments. Close com- 
parison of the two figures indicates 
this to be true for the smaller values 
of t. For values above about 3, 
however, the mean confidence ratings 
of Exp. I lie considerably above those 
for Exp. II. While we believe that 
the latter fact may be significant and 
that it is probably related to the 
differences in complexity or difficulty 
of the tasks in the two experiments, 
we do not wish to put much 
weight upon it, since it is confounded 
with differences in both Ss and Es. 

A striking feature of Fig. 2 is the 
fact that, with one exception, the 
mean confidence ratings for the 
twentieth pairs of cards are higher 
than those for the tenth pairs of 
cards for neighboring values of ¢ 
This can readily be seen by noting 
that the upper end of each curve 
except the first is above the lower end 
of the curve to its right. The same 
effect appears in Fig. 1 at the higher 
levels of t. The magnitude of the 
phenomenon is such that it would 
still be present if the curves were 
drawn with an abscissa representing 
P rather than t. Where it exists, one 
may say that, for a given P, Ss tend 
to give relatively more weight in 
their confidence ratings to the mere 
number of cards or pairs of cards seen 


too 





268 


than a statistician would do under 
the same circumstances. This con- 
stitutes a qualification upon the use 
of the statistician as a model for the 
psychological process of decision- 
making. 


Since in Exp. II the cards were dis- 
played in pairs, it can be asked whether 
Ss treated the two packs of a pair as 
independent or as correlated. Unless 
they suspected trickery, they should 
presumably, on statistical grounds, take 
the former alternative. It is possible 
to provide evidence on this point because 
the two packs were shuffled separately, 
as were the two sets of 10 cards within 
each pack, so that the actual differences 
between pairs of cards as they were seen 
by § varied randomly from 8S to §. 
Thus, we were able to calculate the sum 
of the squares of the first 10 differences 
from a given pair of packs for each S§ 
and to determine the correlation be- 
tween this measure of variability of the 
differences and the Ss’ confidence at the 
end of the 10 pairs. This was done for 
two pairs of packs, with the following 
outcomes. For Pair 2, the sum of the 
10 squared differences varied from 84.5 
to 205.0, with a median of 145.0. The 
correlation (rho) between these measures 
and the tenth confidence ratings was 
— .14. In the case of Pair 4, the sum 
of the squared differences ranged from 
450.0 to 1716.0, with a median of 1139.5, 
and the value of rho was — .11. (The 
minus sign indicates an association 
between high confidence and large sum 
of squares.) Since neither correlation 
approaches significance, there is no 
evidence that Ss’ confidence ratings were 
affected by the variability of the differ- 
ences; in other words, it is tenable that 
Ss treated the two packs of a pair as 
independent. It may be mentioned that 
the median SD’s of the differences that 
were used in this analysis were 2.86 and 
10.38 for Pairs 2 and 4, respectively, 
which agreed closely with those calcu- 
lated from the SD’s of the single packs 
on the assumption of zero correlation, 


viz., 2.83 and 10.61. 


F. W. IRWIN, W. A. S. SMITH, AND J. F. MAYFIELD 


SUMMARY 


Two experiments were performed to test 
certain implications of theories of decision offered 
by Cartwright and Festinger and by the present 
writers. In the first experiment, Ss judged 
from a small sample of numbered cards whether 
the mean of the whole pack was greater or less 
than zero and reported their confidence in these 
judgments. In the second experiment, Ss 
judged which of two packs had the greater mean 
and likewise reported their confidence. The 
principal outcomes were as follows 


1. Mean confidence ratings were directly 
related to the mean, or mean difference, of the 
sample. 

2. The absolute magnitude of mean con- 
fidence ratings was directly related to the sample 
size, indicating that the decision process was 
cumulative. 

3. The absolute magnitude of mean con 
fidence ratings was inversely related to the 
variability of the sample or samples. 

4. In the case of pairs of packs, Ss appear to 
have treated the two samples as independent. 

5. For samples yielding equal values of P for 
t tests of the hypothesis of a population mean of 
zero, Ss’ mean confidence ratings were greater 
for the larger samples, when t was large. 


Outcome 1 was consistent with both Cart 
wright and Festinger’s theory and that of the 
authors. Outcomes 2 and 3 were inconsistent 
with the former theory and consistent with the 
latter. Outcome 4 does not appear to be critical 
for the theories. Outcome 5 probably requires 
at least a minor modification of either theory. 


REFERENCES 


. Brropeau, EF. A. 
confidence. 
271-277. 

. Cartwaicut, D., & Festincer, L. A 
quantitative theory of decision. Psychol 
Rev., 1943, 50, 595-621. 

3. Epwarps, W. The theory of decision- 
making. Psychol. Bull., 1954, 51, 380 
417. 

. Festincer, L. Studies in decision: | 
Decision-time, relative frequency of 
judgment and subjective confidence as 
related to physical stimulus difference 
J. exp. Psychol., 1943, 32, 291-306. 

. Festincer, L. Studies in decision: II. 
An empirical test of a quantitative theory 
of decision. J. exp. Psychol., 1943, 32, 
411-423. 


Statistical versus intuitive 


Amer. J. Psychol., 1952, 65, 


(Received April 22, 1955) 





Journal of Experimental Psychology 
Vol. 51, No. 4, 1956 


FAMILIARITY AND RECOGNITION OF NONSENSE SHAPES ! 


MALCOLM D. ARNOULT 


Skill Components Research Laboratory, Air Force Personnel and Training Research Center 


Following Gottschaldt’s suggestion 
(9) that the perception of two- 
dimensional shapes is independent of 
previous experience, a considerable 
number of investigators applied them- 
selves to testing the generality of his 
hypothesis. Typically, these studies 
(1, 5, 6, 7, 10, 11) demonstrated only 
that shape perception was not inde- 
pendent of training under all con- 
ditions. Few have been concerned 
with demonstrating a precise func- 
tional relationship between perception 
and past experience. 

A major obstacle to obtaining such 
functions is the difficulty in quanti- 
fying the previous experience which 
the observer has had with the stimulus 
material. Howes and Solomon 
showed, for example, that recognition 
thresholds for words could be ex- 
pressed as a negative function of the 
logarithm of the Thorndike-Lorge 
frequency count for the same words 
(12). In somewhat the same vein, 
Attneave related Ss’ judgments of the 
relative frequencies of letters of the 
alphabet to the actual frequencies 
occurring in English text, and ob- 
tained a power function (4). Studies 
of this sort are based upon a quanti- 
fication of the frequency of occurrence 
of the material in the 
environment, but do not provide a 
quantitative statement of the fre- 


stimulus 


''This research was carried out under the 
Air Force Personnel and Training Research 
Center, Lackland Air Force Base, San Antonio, 
Texas, in support of Project 7706. Termission 
is granted for reproduction, translation, publi- 
cation, use and disposal in whole or in part by 
or for the United States Government. The 
data were collected and analyzed by A/1C John 


F. Mallonee. 


quency with which a particular S 
has actually experienced the stimulus. 
Recently Solomon and _ Postman 
“built in” experience with stimuli 
by exposing nonsense words with 
frequencies varying from 0 to 25 (16) 
They then related the frequency of 
stimulation to the recognition thresh- 
old and obtained a curve that could 
not be fitted by a logarithmic func- 
tion, but that could be fitted by an 
inverse logarithmic function. It 
seems clear from these experiments 
that the recognition of verbal ma- 
terials is a negatively accelerated 
monotonic function of the frequency 
with which the materials have been 
experienced, but there is no agree- 
ment as to the form of the 
relationship. 

Some investigations of the role of 
experience in perception have used 
the term familiarity in describing the 
differences in amount of previous 
experience, under the obvious as- 
sumption that familiarity is positively 
related to frequency of stimulation 
Only recently, however, has an at- 
tempt been made to specify the exact 
relationship between familiarity (f) 
and frequency (n) 
unfamiliar 


exact 


Noble exposed 
and meaningless words 
with frequencies ranging from 0 to 
25 and then required his Ss to rate 
the words on a five-point familiarity 
scale (15). The result 
which could best be 
hyperbolic 
f=s _— 2 

Kn+C 
able importance to the fact that the 
data could not be fitted as well by a 
logarithmic function of the sort com- 


was a curve 
fitted by a 


function of the form 


He attached consider- 


269 





270 


mon to psychophysical experiments, 
concluding that the frequency-famili- 
arity relationship was based upon 
absolute learning increments rather 
than comparative judgments. It was 
Noble’s opinion that this-distinction 
would prove important from the 
standpoint of theory construction. 
The two experiments reported here 
were attempts to discover whether the 
functional relationship obtained by 
Noble and by Solomon and Postman 
applied as well to nonverbal materials. 
The experiments were similar to 


Noble’s in that Ss were given varying 
amounts of experience with the stimuli 
and were then asked to rate them on a 
familiarity scale. 


Provision was also 


MALCOLM D. ARNOULT 


made, however, for obtaining a meas- 
ure of recognition (though not a 
threshold measure), and an attempt 
was made to assess the effects of short 
delays between the familiarization 
trials and presentation of the rating 
scale. 


MertTuop 


Apparatus.—During the familiarization 
training, the stimuli were presented individually 
by means of 2 by 2-in. slides and a projector. 
For purposes of rating familiarity, pictures of 
all the stimuli were mounted on a board and 
shown simultaneously. 

Stimuli.—Some of the problems involved in 
selecting a set of nonsense shapes to be used as 
stimuli in an experiment of this sort have been 
discussed by the writer elsewhere (2). It is 
sufficient to point out here that it is impossible 


Fic. 1. ‘The nonsense shapes used as stimuli. 








RECOGNITION OF NONSENSE SHAPES 


at present to specify any method of selecting a 
set of nonsense shapes sufficiently representative 
of all possible shapes that the results of the 
experiment can be uncritically generalized to 
situations in which other stimuli are involved. 
In order to minimize the personal biases of E£, 
however, the nonsense shapes were constructed 
by an arbitrary method involving the plotting 
and connecting of points determined from a 
table of random numbers. A large number of 
such shapes was constructed, and from these 
was drawn a group judged to have a minimum 
of similarity to one another. The 30 stimuli 
used in the experiment are shown in Fig. 1. 

Subjects. —There were seven groups of 100 Ss 
each in the two experiments. Group 0, Group 
1, and Group 2a in Exp. I each contained 45 male 
and 55 female Air Force basic trainees. Group 
2b, Group 3, and Group 5 in Exp. II each con- 
tained 100 male trainees. The seventh group 
(Group X), which was run separately, also con- 
tained 100 male trainees. 

Procedure.—In brief, the procedure was as 
follows. Each S was shown 10 shapes with 10 
different frequencies (1, 2, 3, 4, 5, 8, 10, 15, 20, 
and 25) totaling 93 exposures. Following this 
familiarization training, he was shown all 30 
shapes and required to rate them for familiarity 
on a five-point scale. The scale used was the 
same as the one used by Noble (15). The six 


experimental groups were defined by the delay 


between familiarization training and familiarity 
rating. Experiment I involved delays of 0, 1, 
and 2 hr.; in Exp. Ul the delays were 2, 3, and 
5 hr. A measure of recognition was obtained 
by dichotomizing the familiarity ratings into 
“Familiar” and “Unfamiliar” categories. 

In every session, an equal number of Ss was 
assigned to the three groups. This number 
varied in Exp. I from one to eight, depending 
upon the availability of Ss. In Exp. II, it was 
possible to assign exactly five Ss to each group 
in every session. The familiarization training 
was so scheduled in Exp. I that all Ss used in a 
given session performed the familiarity rating 
simultaneously, whereas in Exp. II the various 
delay groups performed the familiarity rating 
separately. For each session the 30 shapes were 
divided into three groups of 10, and for each 
group of Ss 10 stimuli were assigned randomly 
to the 10 frequencies of stimulation. ‘The order 
of presentation of the shapes was varied system- 
atically for each group, as was the arrangement 
of the 30 stimuli shown in each rating session. 
Figure 1 is an example of one such arrangement. 
There was a total of 23 experimental sessions in 
Exp. I and 20 sessions in Exp. II? 


? Unfortunately, these differences between 
the experiments were beyond £’s control. 


271 


At the beginning of the familiarization 
training, S was told to attend carefully t the 
shapes because he would be asked questions 
about them later. Each shape was exposed for 
15 sec. at each appearance, with an interval of 
5 sec. between exposures. The S was not re- 
quired to make any overt response. Following 
this training, S was asked to indicate on his 
answer sheet the answers to the following 
questions: (a) What was the greatest number 
of times you saw any one shape? (6) What was 
the least number of times you saw any one 
shape? (c) Altogether, how many different 
shapes did you see? The correct answers were 
25, 1, and 10, respectively. Originally, these 
questions were included only for the purpose of 
convincing S that he had completed his task. 
It was hoped that in this fashion the delay groups 
would be discouraged from talking or thinking 
about the stimuli during the delay period. It 
was discovered later, however, that the answers 
to these questions had considerable intrinsic 
interest. 

During the rating session the following in 
structions were given: 

“In a moment I am going to show you a large 
number of pictures of shapes similar to the ones 
you were shown earlier. Some of them will be 
the same as the ones you saw earlier, others will 
be new to you. 

“Each shape will have a number written 
above it. I want you to look at all of them 
carefully and write down the numbers of the 
shapes in the proper columns on your answer 
sheet. Each shape must be rated according to 
the number of times you have experienced it. 
The five possible ratings are: 

NeVER—this means you have never seen that 

particular shape before. 

RARELY—this means you have seen the shape 

before at least once, but only rarely. 

SOME TIMES— this means you have seen the shape 

a few times, but not often, 
OF TEN—this means you have seen the shape more 
than just a few times, but not very often, 
VERY OFTEN-—this means you have seen the 
shape very frequently 

“Take the shapes in order from 1 to 30 so 
you will be sure not to forget any. Put the 
number of the shape in the proper column 
according to the best judgment you can make. 
This is not a ‘speed’ test, so take the time to 
make the best estimate you can for each shape. 
Does everyone understand? All right, begin.” 

It will be noted that, while these instructions 
do not limit the assignment of familiar ratings to 
the shapes previously seen, they are slanted in a 


Known systematic changes in the characteristics 
of the population of Ss, however, would have 
vitiated comparisons between Exp. I and Exp. 
II simply on the grounds that they were per- 
formed at different times of the year. 





MALCOLM D. ARNOULT 


WOGED «FAMILIARITY if) 





4 








FREQUENCY (n) 


Fic. 2. The relation between judged familiarity (f) and the frequency with which the stimulus 
was presented (n). "The number of hours delay between the familiarization exposures and subsequent 


rating of familiarity is used as a parameter. 


way which makes such an interpretation rea- 
sonable, 

The 100 Ss in Group X were asked to rate 
the 30 shapes for familiarity without having had 
the familiarization training. The results from 
this group were used to determine the pre- 
experimental or “intrinsic” familiarity differ- 
ences which existed among the stimuli. 


RESULTS AND Discussion 


When the 


Familiarity. 
tained from Group X were scaled by 


data ob- 
the Attneave Method of Graded 
Dichotomies (3), it was found that 
there were small consistent differences 
between the stimuli in “intrinsic” 
familiarity. Since it had been in- 
feasible to assign the shapes to fre- 
quencies in a counterbalanced fashion 
in the various sessions, these differ- 
ences in intrinsic familiarity were 
used to correct the raw data from the 
experimental groups for any bias that 


may have resulted from the random 
assignment of shapes to frequencies. 
It was found, however, that the cor- 
rections introduced only negligible 
changes, so the data are reported in 
terms of the uncorrected scores for 
convenience. 

The relation between familiarity (f) 
and frequency of stimulation (n) for 
the six delay groups in the two experi- 
ments is shown in Fig. 2. It can be 
seen that f is a monotonic, negatively 
accelerated function of n in this ex- 
periment, as it was in Noble’s experi- 
ment. The most remarkable result 
is the failure to obtain differences of 
any consequence between the six 
groups. On the basis of these data, 
it would appear that delays of as 
much as 5 hr. after familiarization 
have no effect on the familiarity 
ratings. In fact, there is some sug- 





RECOGNITION OF NONSENSE SHAPES 


gestion that longer delays result in 
higher ratings, though the two experi- 
ments were sufficiently different to 
make statistical comparisons between 
them of dubious value. It is re- 
grettable that comparable data are 
not available from Noble’s experi- 
ment, in which verbal materials were 
used, since there is some evidence 
(14) that sharp initial drops in re- 
tention are more characteristic of 
verbal tasks than of nonverbal tasks. 

Recognition.—A stimulus was scored 
as “recognized” if it was rated in 
any familiarity category higher than 
“neveR.” The percentage of recog- 
nition (R) as a function of n was also 
found to be a negatively accelerated 
monotonic function, which rose very 
rapidly and reached an asymptote of 
95% accuracy of recognition at a 
point corresponding to an n value of 
about five. The differences between 
the delay groups were extremely 
small, as indicated by the fact that 


the mean percentages of recognition 
for the six groups were as follows: 
83.2, 83.5, 83.5, 83.0, 83.0, and 83.1. 
It is obvious that for this particular, 
rather simple, recognition task there 
was no difference in mean percentage 
of recognition as a function of delays 


RECOGNITION (R) 


PERCENT 








Qc ++ 
or12366 a " 


FREQUENCY (n) 


Fic. 3. The percentage of Ss recognizing a 
stimulus as a function of the frequency with 
which the stimulus was presented (mn). All delay 
groups have been combined. 


273 


of as much as 5 hr. Figure 3 shows 
the relation between R and n for all 
six experimental groups combined. 
The curve approaches its theoretical 
maximum so rapidly that it was con- 
cluded that the recognition task was 
too easy to justify any attempt to 
specify the precise functional rela- 
tionship between R and n; therefore, 
curve fitting was not attempted for 
these data. It might be noted, 
however, that Luh (13) found a 
decrement in recognition of verbal 
materials after a delay of 4 hr., while 
no such decrement appeared here. 

Generalization. It will be recalled 
that S was asked three questions at 
the end of the familiarization period. 
Originally, these questions were in- 
cluded merely as an attempt to con- 
vince S that the experiment was 
concluded. An examination of the 
answers to these questions, however, 
provided a clue for a possible ex- 
planation of the failure of the delay 
periods to have any effect on the 
familiarity ratings. In answer to 
Question c, “Altogether, how many 
different shapes did you see?” the 
mean response of the 600 Ss was 9.80, 
which was very close to the correct 
answer—10. The mean number of 
stimuli actually rated as familiar, 
however, was 15.64, or about six more, 
on the average, than they had said 
they had seen. This result suggests 
that S was working on the principle, 
“When in doubt, call it familiar.” 
Thus, it could be that as doubt in- 
creases with the number of hours 
delay, so also does the number of 
stimuli rated as familiar. 

The application of such a system 
on the part of S would result in the 
number of failures of recognition 
remaining constant while the number 
of “false” recognitions increased with 
increased delay between familiari- 
zation training and familiarity rating. 





274 


TABLE 1 


Number or “Fatse Recocnitions” anp 
“Faitunes oy Recocnition” For THE 
Various Detay Grours 
(N = 100 in each group) 








Hours Delay 





Experiment I Experiment II 


EEPEL 


False Recognition 











6.72 
4.46 


8.45 
4.38 


7.87 


4.80 | 5.08 | 4.17 





6.15 6.69 





Recognition Failure 





Mean 1.19 | 1.22 
SD 1.41 | 1.33 


09 
1.32 | 1.19 


1.18 
1.20 


AY 














1.19 | 1.21 
l 








Table 1 shows that the first half of 
the prediction holds true, in that the 
number of recognition failures was 
constant across all delay groups. The 
number of false recognitions, however, 
did not increase as predicted. If 
anything, the number of false recog- 
nitions declined with increasing delay, 
though, again, valid statistical com- 
parisons between the two experiments 
are difficult to make. It would be 
interesting to speculate on some of 
the implications of a genuine decline 
in false recognitions as a function of 
several hours delay, particularly in 
conjunction with an increase in rated 
familiarity (as was suggested by the 
curves in Fig. 2). Such a result 
would imply a steepening of the 
generalization gradient associated 
with each stimulus figure as a func- 
tion of the passage of time. What 
is more, the increase in “‘distinctive- 
ness” of the stimulus would be in 
consequence of a procedure which 
did not involve specific discrimination 
training. If this result is indeed 
reliable, it would suggest that this 
kind of perceptual “learning” (if it 


MALCOLM D. ARNOULT 


may be called that) is quite different 
from the verbal learning with which 
we are more familiar. For the pres- 
ent, however, the writer prefers the 
more conservative conclusion that 
there simply was no decrement in 
recognition performance as a func- 
tion of the amounts of delay used. 

Functional relationships.—Since no 
systematic differences had been found 
in the familiarity ratings of the six 
delay groups, the ratings of all groups 
were combined and rescaled prior to 
attempting to determine the func- 
tional relationship between f and n. 
An examination of the combined data 
revealed they violated one of the 
assumptions of the Method of Graded 
Dichotomies, namely, that there be 
equality of ambiguity among the 
stimuli (3). Consequently, the com- 
bined data were scaled according to 
the method devised by Garner and 
Hake (8). In determining the R- 
scale values required by this method, 
it was further discovered that the 
response distributions associated with 
the higher values of n showed a 
marked deviation from normality. 
An attempt was made to correct this 
condition by using only the two points 
lying closest to the median as deter- 
minants of the slopes of the cumu- 
lative frequency curves. This pro- 
cedure produced greater consistency 
of slope among the curves.’ The 
R-scale values associated with the 
medians of the response curves were 
then used as the f values for the 
combined data. 

Attempts were made to fit these f 
values with both logarithmic and 
hyperbolic functions. In each case 

* This somewhat unorthodox procedure is 
defensible in the present case on the grounds 
that a very large number of Ss (N = 600) were 
involved. The fact that heterogeneity of vari- 
ance and nonnormality were problems even 


when such a large N was involved has impli- 
cations for scaling techniques in general. 





RECOGNITION OF NONSENSE SHAPES 


it was possible to obtain a correlation 
index, p, slightly in excess of .95, but 
neither approached the p of .998 
obtained by Noble in fitting his data 
to a hyperbolic function (15). An- 
other approach, however, was sug- 
gested by the evidence presented 
earlier for the occurrence of con- 
siderable generalization among the 
stimuli during the delay period. 

In effect, this generalization would 
be equivalent to a constant increment 
of familiarity for all the stimuli. The 
increment of familiarity would cor- 
respond to the effect of some number 
of theoretical additional exposures, 
from which it follows that f should be 
related, not simply to n, but to n + 7, 
where 7 represents the additional 
exposures. In the absence of other 
information, 7 was assigned the arbi- 
trary value of 1. The equation for 
this function, shown in Fig. 4, is 


f = 2.28 log (n + 1) — .12. 


The index of correlation, p, has a 
value of .997. This value compares 
very favorably with the value ob- 
tained by Noble in his experiment 
using a hyperbolic function. 


The rather extraordinary accuracy 
with which the empirical data were 
fitted in both Noble’s experiment and 
the present experiment tempts one to 


J 


x 
¢ 


f+2200giei)- 2 


3 


ROGED FAMILIARITY (1) 








® 
FREQUENCY (a) 


Fic. 4. The relation between judged famili- 
arity (f) and frequency with which the stimulus 
was presented (n) for the combined delay groups. 


275 


conclude that the functions obtained 
are genuinely different and represent 
different low-order S-R laws. Noble, 
for example, interpreted the relation 
between / and m as “... a performance 
curve determined by changes in some 
hypothetical learning process within the 
organism. The response variable / is 
thus considered as a magnitude function 
of physical events which are analogous 
to trials in a contiguity learning situ- 
ation ...” and rejected the alternative 
interpretation that “ judgments of 
familiarity are uniquely determined by 
perceptual processes of comparison and 
discrimination whose dependency upon 
the physical variable m is nonassoci- 
ational” (15, p. 15). It would appear 
fruitless at this time to institute an 
argument concerning the “‘true’’ famili- 
arity-frequency relationship; it might be 
profitable, however, to examine more 
closely some of the differences between 
the two experiments. For example, 
Noble used verbal materials as stimuli, 
and the present experiment used non- 
sense shapes. The kind of generalization 
which led to the use of the log (» + 1) 
transformation may not occur with 
verbal materials, in that perhaps § 
cannot easily be confused concerning 
whether he has ever seen a particular 
word before, whereas such confusion 
among nonsense shapes may be more 
frequent. Another possible important 
difference was that Noble required his 
Ss to pronounce each word as it ap 
peared, while no overt response was 
involved in the present experiment. 
Investigation of these differences might 
reveal whether the familiarity-frequency 
relation can be specified differentially 
according to the stimulus and response 
characteristics of the situation, or 
whether it will only be possible to say 
that / is a negatively accelerated, 
monotonic function of nm which can be 
fitted equally well by a 
equations. 


variety of 


SUMMARY 


Nonsense shapes were presented with fre 
quencies varying from 0 to 25 to Ss who were 
later required to rate the same stimuli on a five- 





276 MALCOLM D. ARNOULT 


point scale of familiarity. For different groups 
of Ss the delay between the familiarization and 
rating sessions was 0,1,2,3,orShr. A measure 
of recognition was obtained as well by dichoto- 
mizing the ratings into “Familiar” and “Un- 
familiar” categories. A total of seven groups 
of 100 Ss cach was used in two separate experi- 
ments. The conclusions were: 

1. There were no significant differences in 
familiarity as a function of delays of as much as 
5 hr. between the two sessions. 

2. There were no differences in recognition 
as a function of the various amounts of delay. 

3. Familiarity of nonsense shapes was found 
to be a monotonic, negatively accelerated func- 
tion of the frequency of experience. The 
equation was f = 2.28 log (n + 1) — .12, which 
had an index of correlation of .997. 


REFERENCES 


. Attvowt, G. W. Change and decay in the 
visual memory image. Brit. J. Psychol., 
1930, 21, 133-148. 

. Annoutt, M. D. Shape discrimination as 
a function of the angular orientation of 
the stimuli. J. exp. Psychol., 1954, 47, 
323-328, 

. Arrneave, F. A method of graded dichoto- 
mies for the scaling of judgments. 
Psychol. Reo., 1949, 56, 334-340. 

. Arrngave, F. Psychological probability as 
a function of experienced frequency. /. 
exp. Psychol., 1953, 46, 81-86. 

. Brary, K. W. The influence of past ex- 
perience in visual perception. J. exp. 
Psychol., 1933, 16, 613-643. 

. Dyanc, S$. S. Visual apprehension of 
masked forms. /. exp. Psychol., 1937, 
20, 29-59. 


7, Duncxer, K. The influence of past ex- 
perience upon perceptual properties. 
Amer. J. Psychol., 1939, $2, 255-265. 

. Garner, W. R., & Hace, H. W. The 
amount of information in absolute judg- 
ments. Psychol. Rev., 1951, 58, 446-459. 

. Gorrscnatpot, K. Uber den Einfluss der 
Erfahrung auf die Wahrnehmung von 
Figuren. Discussed in K. Koffka, Prin- 
ciples of gestalt psychology. New York: 
Harcourt, Brace, 1935. Pp. 155-159. 

. Hanawatt, N.G. Effect of practice upon 
the perception of simple designs masked 
by more complex designs. J. exp. 
Psychol., 1942, 31, 134-148. 

. Hence, M. An experimental investigation 
of past experience as a determinant of 
visual form perception. J. exp. Psychol., 
1942, 30, 1-22. 

. Howes, D. H., & Sotomon, R. L. Visual 
duration threshold as a function of word 
probability. J. exp. Psychol., 1951, 41, 
401-410. 

. Lun, C. W. The conditions of retention. 
Psychol. Monogr., 1922, 31, No. 3 (Whole 
No. 142). 

. McGeocu, J. A. The psychology of human 
learning. New York: Longmans, Green, 
1942. Pp. 319-320. 

. Noste, C. E. The familiarity-frequency 
relationship. J. exp. Psychol., 1954, 47, 
13-16. 

. Soromon, R. L. & Postman, L. Frequency 
of usage as a determinant of recognition 
thresholds for words. J. exp. Psychol, 
1952, 43, 195-201. 


(Received April 20, 1955) 





Journal of Experimental Psychology 
Vol. S1, No. 4, 1956 


TIME AND INTENSITY AS DETERMINERS 
OF PERCEIVED SHAPE! 


H. LEIBOWITZ AND L. E. BOURNE, JR. 


University of Wisconsin 


It has been demonstrated that a 
round object viewed obliquely is 
matched with ellipses which are more 
circular than would be predicted from 
geometrical optics (16). Such data 
are examples of the tendency toward 
“shape constancy,” a theoretical con- 
dition in which perceived circularity 
is constant independent of the angle 
of regard and the corresponding reti- 
nal image pattern. Various factors 
have been shown to affect the extent 
to which shape constancy occurs. 
(For a review, see Graham, 7.) The 
duration of exposure of the test 
object is particularly critical; an 
exposure duration of .01 sec. destroys 
the tendency towards constancy and 
produces matches which are in good 
agreement with the shape of the 
retinal image (13). 

The purpose of the present study is 
to analyze the role of exposure dura- 
tion and luminance as determiners of 
the extent to which constancy occurs. 
To this end, the functional relation 
between the matched shape of an 
obliquely viewed disc and the expo- 
sure duration is determined. The 
effect of luminance is investigated in 
view of the reciprocal relation be- 
tween exposure time and luminance 
below the critical duration of about .1 
sec. (3, 4, 5, 6, 8, 10, 11, 15) and the 
general importance of luminance as a 
variable in vision. The results in- 
dicate that an exposure longer than 
the critical duration is required for 


' This research was supported in part by a 
grant from the University Research Committee 
from funds provided by the Wisconsin Alumni 
Research Foundation. 


maximum shape constancy. As a 
check on the possibility that this may 
be the result of an experimental 
artifact, eye-movement records were 
taken while subjects were making 
shape judgments. 


APPARATUS AND PROCEDURE 


The basic instrument employed for investigat 
ing perceived shape was a modified Dodge 
tachistoscope utilizing 4-w. fluorescent tubes 
with preheated filaments as sources. In the 
stimulus field, which could be exposed from .01 
to 1.00 sec. in steps of .01 sec., a white dise 3 em 
in diameter and 2 mm. thick was mounted on a 
turntable, the surface of which was covered with 
black “fiock” paper (14). The turntable could 
be rotated so as to present the disc at various 
angles to S’s line of vision. The source provid- 
ing illumination for the disc was mounted 
directly above and about 8 in. from the surface 
of the turntable. The “adapting” field pre 
sented one of a series of 22 ellipses arranged in 
order from a circle to an ellipse for which the 
axis ratio (ratio of minor to major axis) was 
054. The ellipses, each with a major axis 3 em 
in length, were drawn mechanically with black 
India ink on white vellum paper. Field stops 
in the tachistoscope restricted the field of view 
for either the disc or any one of the ellipses to a 
square subtending 4.8° visual angle. The fields 
were separated by a strip 4.8° wide. The S 
saw alternately cither the disc or one of the 
ellipses at a viewing distance of 31 in., the rest 
of the field being dark. 

The Ss were instructed to choose the ellipse, 
by rotation of appropriate knobs, which “looked 
the most like the disc” and were told to take as 
much time (as many exposures of the disc) as 
was required to make a satisfactory match. All 
Ss were volunteers from elementary psychology 
classes and were unaware of the constancy 
phenomenon, the nature of the apparatus, or the 
purpose of the experiment. No S served for 
more than one session which, even for the slowest 
Ss, was completed within 30 min. All observa 
tions were made binocularly with the natural 


pupil. 


277 





278 


TABLE 1 


Averace Axis Ratios or Matrcuep Ex.pses 
as A Function or Exposure Duration 
at Taree Levers or Luminance 





Test-Object Luminance (Millilamberts) 





Exposure 


Duration 1.0 





SD SD 


039 
038 
020 
O87 
074 


Mean 


515 
517 
570 
692 
802 
7H 
B42 


Mean 


114) .524 
064 | 528 
ALL] .566 
052 | .608 
046 | .688 
038 | .670 | 136 
043 | .737 | .145 




















The effect of exposure duration and luminance 
on S’s matches was investigated with the turn- 
table constant at an angle of 30° to S’s line of 
vision (stimulus axis ratio of .500). Seven 
groups of seven Ss each were employed. Three 
groups were tested at seven durations with 
luminance as parameter. The durations em- 
ployed were .O1, .05, .10, .25, .50, .75, and 1.00 
sec., and the luminance values, constant for a 
given group, were 10, .1, and .01 ml. Three 
additional groups were tested at luminance 
values of 10, 1.0, .32, .1, .032, .01, and .005 
ml, with duration constant at either 1.0, .1, or 
O1 sec. 

In order to obtain a source of shorter duration 
than is possible with the fluorescent tubes, an 


H. LEIBOWITZ AND L. E. BOURNE, JR. 


additional group was tested with a Kemlite 
electronic DX flash tube providing illuminance 
for the stimulus. This produces a .0005-sec. 
flash of extremely high luminance. Because of 
the difficulties involved in making a photometric 
match with this source, luminance levels are 
specified relative to the minimum level at which 
matches could be made. The Ss made matches 
at a level just high enough so that the disc 
was visible, and at additional levels 3.2, 32, 320, 
3200, 32,000 and 320,000 times this value. 

Eye-movement records, while Ss were making 
shape judgments, were made with an American 
Optical Company ophthalmograph (1). The Ss 
viewed an inclined dise and signaled whether it 
appeared “rounder” or “flatter” than photo- 
graphs of the same disc taken at various angles 
of inclination and presented in random order. 
The 15 Ss employed were given two 36-sec. 
runs while making shape judgments. The 
duration of the intervals spent in viewing the 
disc was obtained for the nearest .1 sec. by 
measuring the film records. 


RESULTS 


The average data for shape match- 
ing and their SD’s are presented in 
Tables 1 and 2. The values relating 
matched shape to duration with lumi- 
nance constant are plotted in Fig. 1, 
while Fig. 2 presents the data obtained 
as a function of luminance with dura- 
tion as parameter. The results for the 


TABLE 2 


AVERAGE 


Axis Ratios or Marcuep Exupses as a Function or Loc Luminance 


at Four Durations or Exposure 





Duration of Exposure (Seconds) 





Luminance 


(Millilamberts) 0.1 


Oo 
Log Relative 





Mean SD 
A9T 
522 
524 
539 
612 
61 5 
644 


038 
030 
O55 
073 
053 
073 
O85 


033 
O46 
072 
078 
110 
154 














O49 


Luminance* 
Mean 


ADI 
520° 
5248 
546 
512 
526 
543 




















*6 Sa. 
&4 Se. 
*3 Se. 
4.2 Ss. 


* Expressed relative to the lowest level at which matches could be made. 





DETERMINERS OF PERCEIVED SHAPE 





£) 
S 


4AW Of SHAPE CONSTANCY 


_ 


we 


LAW Of THE ALTINAL (MAGE 











/ 2 3 , $ 6 7 4 3’ 40 


OURATION OF EXPOSURE (stconos) 


Fic. 1. Mean matched axis ratios for a disc 
inclined 30° to the horizontal plane as a function 
of exposure duration with the indicated lumi- 
nance values as parameter. 


AXIS RATIOS OF MATCHED E1L/P. 


.0005-sec. duration are not plotted as 
they are nearly identical with the data 
for the .0l-sec. duration at comparable 
luminance levels. 

The relationship between matched 
shape and either luminance or dura- 
tion is similar. All functions rise 
rapidly at first with an increase in 
either variable, and then more slowly, 
before approaching a limiting value. 
For conditions near threshold, the 
axis ratios of the matched ellipses 
agree with the shape of the retinal 
image as would be predicted from 
geometrical optics (law of the retinal 





LAW Of SHAPE CONSTANCY 


2 g 


a 


, 
ao 


ae ° 
' 


LAW OF THE RITINAL 


w 








2 a o 
LOG LUMINANCE (iL AmstATs) 


Fic. 2. Mean matched axis ratios for a disc 
inclined 30° to the horizontal plane as a function 
of luminance with the 
durations as parameter. 





AXIS RATIOS OF MATCHED £LL/PSES 


indicated exposure 





PERCENT 








a : ed 





423¢28s 6 7868 40 ‘2 ‘* 


DURATION (stconbs) 


Fic. 3. Percentage of observation time 
intervals spent viewing the test object while 
making shape judgments. 


image). As either luminance or dura- 
tion is increased, the axis ratios be- 
come larger, tending to approach the 
theoretical line representing the law 
of shape constancy. Except for the 
near threshold conditions, all data fall 
between the two theoretical extremes. 
This is typical in visual perception; 
the data represent a compromise be- 
tween retinal image properties and 
object constancy. 

The eye-movement records were 
measured to the nearest .1 sec. and 
the percentage of cases falling in .1- 
sec. intervals are plotted in Fig. 3. 


Discussion 


The data of the present experiment 
indicate that the function relating per- 
ceived shape to either duration of ex- 
posure or to luminance increases rapidly 
at first, and then more slowly, before 
approaching a limiting value. A_ hy- 
pothesis which may be considered in 
interpreting the observed effects resulting 
from variation of luminance is that 
perceptual constancies depend upon the 
presence in the visual field of stimuli in 
addition to the discriminative stimulus 
(7). This is consistent with the demon- 
stration that the tendency toward bright- 
ness constancy (12) and size constancy 
(9) is destroyed when S’s visual field is 





280 


limited to the discriminative stimulus 
by a reduction screen or similar device. 
The nature of these “additional” stimuli 
(probably cues for slant, depth, texture, 
etc.) and the mechanism by which they 
influence the tendency toward constancy 
may not be known with certainty, but it 
is a reasonable assumption that their 
luminance values must be high enough 
to permit discrimination by the observer. 
As luminance is lowered, visual acuity 
and intensity discrimination (2) are 
reduced, as is the tendency toward shape 
constancy. Thus, acuity, intensity dis- 
crimination, and the tendency toward 
shape constancy are seen to be similarly 
related to the same independent variable, 
luminance. In line with the assumption 
of the importance of “additional” cues 
in perception, the observed effect of 
luminance on perceived shape may be 
attributed to the impairment of acuity 
and intensity discrimination for these 
“additional” stimuli in the visual field. 
With respect to the variable of expo- 
sure duration, it will be convenient to 
consider separately durations below and 
above the critical duration of .1 sec. 
Below critical duration the same visual 
effect may be produced for the absolute 
threshold (4, 8), differential threshold 
(6), grating acuity (5, 15), and span of 
attention (10) provided the product of 
intensity and time (i.e., the total energy) 
remains constant. This reciprocal rela- 
tion, the Bunsen-Roscoe law, means for 
the indicated visual measures reduc- 
tion of exposure time is equivalent, 
below critical duration, to reduction of 
luminance. Since in the present study 
it is demonstrated that luminance is a 
variable in shape perception, some of the 
observed effects below .1 sec. may be 
attributed to the reciprocity effect. 
However, all of the variation below 
critical duration can not be interpreted in 
the same manner. If duration of ex- 
posure were important only as a deter- 
miner of the total luminous flux, the 
curves relating a visual measure to 
luminance should be superimposable by 
a shift along the luminance axis pro- 
portional to the exposure durations 
employed. This is true for intensity 


H. LEIBOWITZ AND 


L. E. BOURNE, JR. 


discrimination (6) and grating acuity (5, 
15) but not for perceived shape, as can 
be seen from curves for the .1-sec. and 
O1-sec. durations of Fig. 2 and the data 
for the .0005-sec. exposure listed in 
Table 2. It is apparent that exposure 
duration has an effect on perceived shape 
in addition to its relationship to the 
total stimulus energy. 

Of particular interest is the pronounced 
effect on the curves, relating matched 
shape to duration, for values above .1 
sec. (Fig. 1). These functions continue 
to rise beyond the critical duration, 
leveling off at about .5 sec. The eye- 
movement records were taken to check 
the possibility that the time required to 
shift fixation from the comparison to the 
test field may account for this result. 
The eye-movement data (Fig. 3) indicate 
that 95% of the intervals spent in view- 
ing the test object are longer than .2 to 
.3 sec. The mean duration is .48 sec. 
Thus, when time is not restricted, the 
duration of observation of the test object 
is of the same order of magnitude as the 
exposure duration required for the maxi- 
mum tendency toward constancy in the 
tachistoscope. It appears, therefore, 
that the observed diminution of the con- 
stancy effect can be attributed to the 
reduction of the observation time of the 
test object below the interval normally 
utilized while making shape judgments. 
An adequate theory of shape perception 
must account for the important effect 
of exposure duration which is effective in 
addition to its role in the reciprocity 
relationship of vision. 


SUMMARY 


1. The function relating matched shape to 
exposure duration and to luminance was 
determined by matching ellipses with an ob- 
liquely viewed disc. 

2. For near-threshold stimulus conditions, 
the axis ratios of matched ellipses are in agree- 
ment with predictions made on the basis of 
retinal image theory. 

3. With increase in either duration or 
luminance, the matched axis ratios become 
larger. ‘These functions increase rapidly at 
first, and then more slowly, before approaching 
a limiting value. 





DETERMINERS OF PERCEIVED SHAPE 281 


4. The diminution of the tendency toward 
perceptual constancy resulting from reduction 
of luminance is attributed to the impairment of 
visual acuity and intensity discrimination for the 
“additional” stimuli in the visual field. 

5. Some of the variation due to reduction of 
exposure below critical duration can be at- 
tributed to the reciprocal relation between time 
and intensity. However, there is an effect 
both below and above critical duration which 
represents the influence of exposure time as a 
determiner of perceived shape in addition to its 
role in the reciprocity relationship. 

6. Eye-movement records, taken while Ss 
were making shape judgments, confirm the 
finding that an exposure longer than the critical 
duration is required to produce the maximum 
tendency toward shape constancy. 


REFERENCES 


. American Orticat Company. “Reading” 
in the class room; ... with the oph- 
thalmograph. Southbridge, Mass. : 
American Optical Co., 1937. 

. Bartiey, S. H. The psychophysiology of 
vision. InS.S. Stevens (Ed.), Handbook 
of experimental psychology. New York: 
Wiley, 1951. Ch. 24. 

. Brown, R.H. Velocity discrimination and 
the intensity-time relation. J. Opt. Soc. 
Amer., 1955, 45, 189-192. 

. Granam, C. H., & Marcaria, R. Area 
and the intensity-time relation in the 
peripheral retina. Amer. J. Physiol, 
1935, 113, 299-305. 

. Granam, C. H., & Coox, C. Visual acuity 
as a function of intensity and exposure 
time. Amer. J. Psychol., 1937, 49, 
654-661. 


6. Granam, C. H., & Kempe, E. H. Brightness 
discrimination as a function of the 
duration of the increment in intensity. 
J. gen. Physiol., 1938, 21, 635-650. 

. Granam, C. H. Visual perception. In 
S. S. Stevens (Ed.), Handbook of experi- 
mental psychology. New York: Wiley, 
1951. Ch. 23. 

. Hartune, H. K. Intensity and duration 
in the excitation of single photoreceptor 
units. J. cell. comp. Physiol., 1934, 5, 
229-247. 

. Horway, A. H., & Bortnc, E. G. De- 
terminants of apparent visual angle with 
distance variant. Amer. J. Psychol., 
1941, 54, 21-37. 

. Huwrer, W. S., & Sicter, M. The span of 
visual discrimination as a function of 
time and intensity of stimulation. J. exp. 
Psychol., 1940, 26, 160-179. 

. Karn, H. W. Area and the intensity-time 
relationship in the fovea. J. gen. 
Psychol., 1936, 14, 360-369. 

. Karz, D. The world of colour. (Trans. 
by R. B. Macleod and C. W. Fox.) 
London: Kegan Paul, 1935. 

. Lerpowrrz, H., Mrrcunets, F., & An- 
caist, N. Exposure duration in the 
perception of shape. Science, 1954, 120, 
400. 

. Nowetz, M. A light absorbent surface. 
Amer. J. Psychol., 1951, 64, 109-110. 

. Niven, J. L, & Brown, R. H. Visual 
resolution as a function of intensity and 
exposure time in the human fovea. 
J. Opt. Soc. Amer., 1944, 34, 738-743. 

. Taoutess, R. H. Phenomenal regression 
to the “real” object. I, Il. Brit. J. 
Prychol., 1931, 21, 339-359; 22, 1-30. 


(Received April 1, 1955) 





Jowrnal of Experimental Psychology 
Vol. 51, 4a 4, 1956 


DIRECTION AND MAGNITUDE OF RESPONSE ERRORS 
IN A HORIZONTAL DISPLAY-CONTROL PATTERN ! 


LEON T. KATCHMAR,? SHERMAN ROSS, AND T. G. ANDREWS 
University of Maryland 


This experiment investigates the 
direction of response errors in a 
perceptual-motor task using a hori- 
zontal display-control pattern. A 
number of studies on the nature of 
the stimulus-response relations in such 
tasks as related to spatial corres- 
pondence (1, 7, 13), stimulus-response 
compatibility or congruency (3, 7, 9, 
10), population stereotypes (1, 10, 14) 
and stimulus complexity (11), have 
appeared. In general, the results 
show that better performance results 
when the response (control) pattern 
closely approximates the display pat- 
tern and when the response require- 
ments take into account population 
stereotypes. 

Several workers have noted the 
occurrence of “shifts of attention” to 
particular portions of the display as 
measured by response accuracy in 
horizontal display-control patterns. 
For example, Conrad (5, 6) reports 
this type of phenomenon in his 20- 
dials test under high speed conditions. 
He found that as the speed of stimulus 
presentations increased, response ac- 
curacy decreased at the extremes of 
. the display and increased for the 
central positions. Woessner (15) used 
a horizontal display of five lights and 
reports that response accuracy, under 
conditions of high speed, is greatest 
for the second and fourth positions. 


1 This experiment is one of a series of studies 
on behavior decrement performed under Con- 
tract No, DA+49-007-MD-222 between the 
Medical Research and Development Division, 
Office of the Surgeon General, Department of 
the Army, and the University of Maryland. 

2 Now at the Human Engineering Laboratory, 
Aberdeen Proving Ground, Maryland. 


282 


Under lower speed conditions no 
positional accuracy was found. A 
partial explanation for this shifting 
may be found in the work of Bahrick, 
Rankin, and Fitts (2), which verified 
the hypothesis that an increase in 
incentive results in increased per- 
ceptual selectiveness favoring those 
parts of the stimulus field which are 
interpreted by S as most relevant to 
the expected reward. 

An additional problem in display- 
control relationships which has not 
been widely investigated is that of 
the direction and magnitude of re- 
sponse errors. In a horizontal dis- 
play-control arrangement two types 
of errors can be made. The first 
type of error is due to a failure to 
respond stimulus. 


to a_ particular 
The second type of error occurs when 
an incorrect response is made to a 


particular stimulus. When an error 
occurs in responding to a horizontal 
display-control arrangement, the error 
can be described as being to the right 
or left of the correct position in 
addition to being X positions re- 
moved. Of course, the direction of 
errors is limited at the two ends of 
the control panel. 

The purpose of this study is to 
investigate the direction and magni- 
tude of response errors in a perceptual- 
motor task using a horizontal display- 
control arrangement. Specifically the 
problem is to determine if the location, 
direction, and magnitude of response 
errors change as a function of the 
number of stimuli, the distance be- 
tween stimuli, and the type of re- 
sponse made (verbal or motor). 





RESPONSE ERRORS 


Metuop 
Apparatus 


Display panels.—The different numbers of 
lights used were 5, 9, and 11, and the distances 
between the lights were 3, 5, and 7 in. This 
necessitated the construction of nine separate 
display panels. The lengths of these panels 
were 18, 30, and 42 in. for 5 lights; 30, 50, and 
70 in. for 9 lights; and 36, 70, and 84 in. for 11 
lights. ‘The panels were made from flat wooden 
boards 6 in. The stimulus lights were 
mounted behind 1-in. holes in the display panels, 
and covered with ground-glass discs. 

Programmer.—The lights always were pre- 
sented in a random order at the rate of one light 
every .6 sec. The duration of the light signal 
was .5 sec. This programmer was a plastic 
drum on which were mounted metal pegs that 
activated microswitches. 

Recorder and control unit,—The recorder and 
control unit were used only for motor responses. 
The control unit was mounted on a small table 
and consisted of 11 removable keys in order to 
conform co the 


wide. 


various display conditions. 
These keys were spaced 3 in. apart, regardless 
of the distance between the lights in the display 
panel. An automatic kymographic recording 
made of the Ss’ For verbal 
responses the stimulus light at the left end of 
the display panel was labeled 1, etc. The S 
was required to call out the position of the 
stimulus, while E recorded the 
prepared data sheets. 

Subjects and procedure.—The Ss were 36 male 
undergraduate students who volunteered their 
All Ss reported themselves to be right- 
handed. ‘These 36 Ss were first divided into 
two groups: a verbal response group and a 
motor response group. Within each of these 
two main groups the Ss were further divided 
into three subgroups—one group for each of the 
different distances between lights: 3, 5, and 
7 in. Thus we had six groups with 6 Ss per 
group. Each group of Ss was tested for each 
of the 5-, 9- and 11-light conditions. ‘The order 
ot exposure to the different light conditions was 


was responses. 


responses on 


services. 


counterbalanced within each group. 

Each S received four testing sessions, one 
session on each of four days, lasting approxi- 
mately 5 min. ‘The first session was used for 
The remaining three 
sessions were used for actual testing purposes. 
A session consisted of having S respond to nine 


orientation and practice. 


presentations, in random order, of the stimulus 
lights in the display panel. The Ss were in- 
structed to respond as quickly and accurately 
as possible, and to attempt to make a response 
to every stimulus. 

Analyses.—The results obtained for analysis 
were; (a) number of errors made to each of the 








Sona, 3 POSITIONS 


Fic.l. Mean number of errors for each 
signal position for the 5-, 9-, and I1-light dis 
plays. Verbal and motor response curves are 


shown separately. 


signal povitions in the various displays, (b) 
direction of errors made to each of the signal 
positions in the various displays (number of 
errors to the right minus errors to the left of a 
given signal position), and (c) magnitude of 
direction errors for each of the signal positions 
(magnitude of errors to the right minus magni 
tude of errors to the left) Magnitude refers 
to the number of positions removed from the 
correct position. 

Fach of these measures was analyzed sepa 
rately for the 5-, 9-, and Il-light displays by 
use of the analysis of variance technique. In 
each of the analyses the sub group variances 
were homogeneous by Bartlett's criterion (8) 


RESULTS 


Mean errors for each signal position. 
Figure 1 shows the principle results 


obtained for the mean number of 
errors for each signal position in the 
5-, 9-, and I1-light displays. Verbal 
and motor response curves are shown 
separately. All errors were response 
errors, i.e., no errors of omission were 
made. A summary of the analyses 


*Complete tables of data showing Mean 
Number of Errors and Mean Directional Errors 
for Fach Signal Position have been filed with 
the American Documentation Institute. Order 
Document No. 4780, remitting $1.25 for micro- 
film or $1.25 for photocopies. 





L. T. KATCHMAR, S. ROSS, AND T. G. ANDREWS 


TABLE 1 


Awnatyses or Vaniance or Mean Errors ror Eacu or tue Licut Disptays 








Five Lights 


Nine Lights Eleven Lights 





ad 


Mean 
Square 


Mean 
Square 


Mean 
Square 





Between Responses 

Between Distances 

Between Signal Positions 

Responses X Distances 

Responses X Positions 

Distances X Positions 

Responses X Distances X Positions 
Residual Within 


Total 








24.200°** 

039 
822 
.217 
322 
343 

3,293* 

1.356 


50.567°** 
.281 

9.359""* 
7.929 
3.019 
2.763 


3,139 
2.809 


58.343°** 

10.245* 

68.624°** 
7.018 
9.038°* 
2.145 
2.312 
2.400 

















* Significant at the .05 level. 
** Significant at the 01 level. 
*** Significant at the .0O1 level. 


for the 5-, 9-, and 11-light displays is 
shown in Table 1. 

In each case the motor response 
groups show a significantly greater 
number of errors than the verbal 
response groups. No consistent ef- 
fects were found for the various 
distances between the lights in the 
There is, however, a 


display panels. 
suggestion that for the 11-light display 
the distance between lights may pro- 


duce an effect. It was found that 











SIGNAL POSITIONS 
Fic. 2. Mean directional errors for each 
signal position for the light conditions. Mo- 
tor and verbal responses are shown separately. 


the fewest errors were associated with 
the 5-in. distance, followed by the 3- 
and 7-in. distances, respectively. 

The significance associated with the 
signal positions for the 9- and 11-light 
displays is for combined verbal and 
motor responses. The original means 
showed greater response accuracy to 
positions on the left side of the dis- 
plays. Considered separately, it will 
be noticed that the distribution of 
errors for the verbal condition shows 
a skewness toward the Position 1, 
while the distribution of errors for 
the motor condition is more nearly 
symmetrical. A bimodality is noted 
in each of these figures. 

While there is no direct method of 
testing the significance between the 
5-, 9-, and 11-light display conditions, 
it is clear from Fig. 1 that, with the 
exception of the 5-light condition, 
positional accuracy is dependent upon 
the type of response made. 

Directional errors.—Figure 2 shows 
the principal results obtained for the 
mean direction errors for each signal 
position in the 5-, 9-, and I1-light 
displays.* Verbal and motor re- 
sponses shown separately. A 


* See footnote 3. 


are 





RESPONSE ERRORS 


summary of the analyses of these 
directional errors is shown in Table 2. 

No significant sources of variation 
were found for the 5-light displays. 
This indicates that the errors were 
fairly well distributed around each of 
the signal positions except for Posi- 
tion | and 5. As was mentioned 
earlier this is an artifact of the 
measure used, 

For the 9-light displays two of the 
main sources, responses and distances, 
were significant beyond the .01 level. 
The motor response groups committed 
errors which were predominantly to 
the right, while the opposite was true 
for the verbal response groups. The 
effects of distance between lights for 
the combined responses showed that 
for the 3- and 5-in. distances the error 
tendency was to the left, while for the 
7-in. distance the error tendency was 
to the right. 

The results of the analysis for the 
I1-light displays show that the main 
sources of responses and positions, 
and the interaction of responses and 
positions were significant beyond the 
O1 level. The over-all directional 
tendency for the motor response 
groups was to the right, while for the 
verbal response group the tendency 


285 


was to the left. The interaction of 
responses and positions is attributable 
to Positions 6, 7, 8, and 9. The 
average direction of errors for the 
verbal group for these positions is to 
the left, while for the motor response 
group the average direction to these 
positions is to the right. 

Magnitude of direction errors.—The 
results for this measure will not be 
presented, because only 6% of the 
errors made were more than one 
position removed. The results ob- 
tained were, for all practical purposes, 
identical to those obtained for simple 
directional errors presented earlier. 


Discussion 


One possible explanation for the re- 
lationship between verbal responses and 
left-direction errors is that in such tasks 
as reading and writing the normal se- 
quence is from left toright. It may have 
been easier for Ss to orient themselves 
to the left rather than to the right side 
of the display. The results seem to 
support this interpretation since there 
were relatively fewer errors made to the 
left side of the displays. A tentative 
explanation for the right-directional 
tendency of response errors for the 
motor response groups is provided by 
the work of Brown, Knauft, and Rosen- 


TABLE 2 


Anatyses or Variance or Directionat Errors ror Eacu or tue Licur Disrtays 








Five Lights 


Nine Lights Eleven Lights 





Source 


S 


Mean 
Square 


Mean 
Square 





Between Responses 
Between Distances 
Between Signal Positions 

Responses X Distances 

Responses X Positions 

Distances X Positions 

Responses X Distances X Positions! 


Residual Within 
Total 


—_— = 
wv 
2 Cea enrnenr- 








147,125°** 
6.909 
16.410°* 
5.358 
21.651°** 
4.915 
5.303 
4.376 


164.65 oe 

61.505*** 
8.472 
11.454 
14.486 
6.516 
2.068 
7.432 

















* Significant at the .05 level. 
** Significant at the .01 level. 
*** Significant at the .001 level. 





286 L. T. KATCHMAR, 8S. ROSS, AND T. G. ANDREWS 


baum (4). In their study on positioni 
reactions, these workers report that short 
distances (10 cm. or less) are under- 
estimated, while longer distances (20 
cm. or more) are overestimated. The 
Ss in the present study were allowed to 
seat themselves comfortably in front of 
the control panel. It was noticed that 
most of the Ss sat in such a position that 
there were more response keys to the 
right of the body than to the left. All 
Ss reported themselves to be right- 
handed. This meant that the right limb 
had to travel a greater distance to the 
right side of the body than to the left 
side. Pursuant to the findings of Brown, 
Knauft, and Rosenbaum we might expect 
an overestimation of the positions in the 
right half of the response panel. 


SUMMARY 


This experiment investigated the effects of 
certain display-control variables on the number 
of errors and the direction of errors committed 
in a perceptual-motor task. Thirty-six male 
college students served as Ss in the experiment. 
The results permit the following conclusions: 

1. Performance on the task used is signifi- 
cantly affected by the type of response made. 
The greater number of errors is associated with 
the motor response. 

2. Accuracy for certain signal or response 
positions differs with the type of response. For 
verbal responses the positions in the left half of 
the 9- and 11-light displays were more accurate. 
For motor responses the positions at the ends 
of the display showed the greatest accuracy. 
No differential accuracy was found for the 5- 
light displays. 

3. Directional errors appear to be a function 
of the type of response made. For verbal 
responses errors are predominantly to the left, 
while for motor responses errors are predomi- 
nantly to the right. 

4. Magnitude of directional errore was not 
found to be significant. Only 6% of the errors 
committed were more than one position removed. 

5. Distance between signal positions does not 
prove to be significant for any but the 11-light 
display. 


REFERENCES 


1. Anperson, N. H., Grant, D. A, & 
Nystrom, C. O. Performance on a re- 
petitive key pressing task as a function 
of the spatial positioning of the stimulus 
and response components. USAF, 
WADC Tech. Rep., No. 54-76, 1954. 


2. Banaicx, H. P., Rankin, R. E., & Fitts, 
P. M. Effect of incentives upon re- 
actions to peripheral stimuli. J. exp. 
Psychol., 1952, 44, 400-406. 

- Bowen, J. H. Effects of preliminary types 
of training on subsequent discriminative 
motor learning. Unpublished doctor’s 
dissertation, Univ. of Maryland, 1955. 

. Brown, J. S., Kwaurrt, E. B., & Rosensaum, 
G. The accuracy of positioning reactions 
as a function of their direction and extent. 
Amer. J. Psychol., 1948, 61, 167-182. 

. Conran, R. The effects of very fast speeds 
on multiple dial watching. Med. Res. 
Council, A. P. U., 1949, 112-115. 

. Conran, R. Speed and load stress in a 
sensori-motor skill. Brit. J. industr. 
Med., 1951, 8, 1-7. 

. Demnincer, R. L., Fitts, P. M., & Nosre, 
M. E. Stimulus response compatibility ; 
II. The effect of two stimulus variables 
on a single set of responses. Paper read 
at Midwest. Psychol. Ass., 1952. (Ab- 
stract) 

. Epwarps, A. L. Experimental design in 
psychological research. New York: Rine- 
hart, 1950. 

. Frrvs, P. M., & Seecer, C. M. 
patibility: Spatial 
stimulus response codes. 
chol., 1953, 46, 199-210. 

. Frrrs, P. M., & Detnincer, R. L. S-R 
compatibility: correspondence among 
paired elements within stimulus and 
response codes. J. exp. Psychol., 1954, 
48, 483-492. 

. Grece, L. W. The effect of stimulus com 
plexity on discriminative responses. /. 
exp. Psychol., 1954, 48, 289-297. 

. Knowres, W. B., Newuin, E. P., & Garvey, 
W. D.° The effect of speed and load on 
display-control relationships. J. exp. 
Psychol., 1953, 46, 65-75. 

. Morin, R. E., & Grant, D. A. Learning 
and performance on a key-pressing task 
as function of the degree of spatial 
stimulus-response correspondence. /. 
exp. Psychol., 1955, 49, 39-47. 

. Ross, S., Suerr, B. E., & Anprews, T. G. 
Response preferences in display-control 
relationships. J. appl. Psychol., 1955, 
39, 425-428. 

. Woessner, B. L. Effects of stress and 
display-control relationships on response 
discrimination. Army Med. Res. De- 
velpm. Dio., Of. Surgeon Gen., Dept. of 
the Army, Tech. Rep., No. 21, Project 
DA-49-007-MD-222 (O.1. 18-52), Univ. 
of Maryland, 1953. 


S-R com- 
characteristics of 


J. exp. Psy- 


(Received April 26, 1955) 





Journal of Experimental Psychology 
Vol. Si, No. 4, 1956 


LATENT LEARNING IN THE THREE-TABLE APPARATUS 


WILLIAM F. OAKES 


University of Minnesota 


This is an experiment on latent 
learning of the second type listed by 
Thistlethwaite (4), exemplified by the 
Seward (3) experiment using a single 
T maze. In Seward’s experiment, 
hungry rats were allowed to explore 
the maze in groups of five or six for 
three days, and were fed immediately 
afterward in their home cages. On 
the test day the rats were fed directly 
(not following a run) in the goal box, 
then removed to the starting box. 
He found that the rats could make the 
correct response when both intra- and 
extramaze cues were present. 

In the present experiment an ap- 
paratus similar to that used by Maier 
(1) in his three-table reasoning prob- 
lem was used. In the Maier experi- 


ment rats were allowed to explore the 


apparatus consisting of three tables 
(providing different stimulus com- 
plexes) connected by runways. On 
the test day they were allowed to eat 
a little food on one of the tables, then 
were put on one of the other tables to 
see if they could find the way directly 
back to the table on which they had 
just been fed. In the Maier design 
the tests were run with the same Ss 
on many successive days, with a 
different table being correct each day. 
After the first day, however, the run- 
ning of the pathways had been differ- 
entially reinforced, so that it was no 
longer a test of latent learning. 
Maier reported (2) that 75% of the 
choices were correct on the first trial. 

The present experiment is basically 
the same as Maier’s and Seward’s, 
except that Ss are alone during ex- 
ploration, more exploration is per- 
mitted, and S is not permitted to eat 


in the apparatus on the test day. A 
detention cage is also used to lessen 
possible reinforcing effects of return 
to living cage. Thus the present 
experiment is a test for latent learning 
with a design that attempts to elimi- 
nate, insofar as possible, all reinforcers 
from the situation. 


Metuop 


Subjects. —Twenty-four naive male albino 
rats were used. They were 9 days old at the 
beginning of the experiment. 

Apparatus.—The apparatus used consisted of 
three 12 by 14-in. platforms connected by 6-ft. 
runways 2 in. wide. Figure | shows a diagram 
of the apparatus and the experimental room. 
The platforms and runways were elevated about 
4 ft. above the floor. Platform A had a metal 
washboard surface, B had a glass surface, and 
C had a coarse sandpaper surface. Barriers 
were vrovided that could be used to block each 
platform off from the rest of the apparatus. 
Runways and barriers were painted a dull gray. 
Two such pieces of apparatus were assembled 
in a large experimental room so that two rats 
could be run simultaneously. A partially covered 
6-w. bulb was suspended from the ceiling about 
24 in. above the center of each apparatus, 
providing very dim illumination in order that 
E could observe the movements of the rats on 
the apparatus. 

Procedure.—The experimental procedure can 
conveniently be divided into three phases: the 
familiarization, feeding, and test phases. In the 
familiarization phase Ss were placed on the 
apparatus and allowed to explore, with no food 
present. In the feeding phase S was fed on one 
of the platforms. On the test S was placed on 
a platform other than the feeding platform. 

The Ss were kept in individual living cages 
during all phases of the experiment. ‘They were 
all treated individually, so that each S was alone 
at all times—in the apparatus, in the detention 
cage, and in its living cage. During the famili 
arization phase S would be taken from the living 
cage, placed on one of the three platforms, and 
allowed to explore the apparatus for 15 min. It 
was then placed in a detention cage for 7.5 min., 
then replaced in its living cage. The Ss were 





WILLIAM F. OAKES 


——TAM__WALL 

















A - WASNBOARD 6&URFACE 
8 - Gtace . 
C - SANDPAPER * 


Fic.1. 


run a total of 12 days during the familiarization 
phase. The £ recorded the wanderings of Ss 
and the time each § spent on each of the plat- 
forms. Each S was placed on each of the three 
platforms an equal number of times in ran- 
domized order. During the familiarization 
phase Ss were fed in the morning and run in the 
evening so that they were run under 11-15-hr. 
food deprivation. 

During the feeding phase the runways were 
blocked off from the platforms. ‘Thus S could 
not leave the platform on which it was placed. 
During this phase a pan full of dry food was kept 
on each platform. Each S was assigned one 
platform (randomly) on which it was placed for 
15 min, each day on three successive days. The 
Ss were not fed except on the platform during 
this phase; thus, on the first feeding day Ss had 
been deprived of food for 35-39 hr., and on the 
second and third days for 24 hr. No record was 
kept of the amount of food eaten by Ss, but all 
of them ate some food. Water was always 
available in the home cages. 

In the test phase S was placed on a platform 
other than the one on which it had been fed 
during the feeding phase, and a correct response 
consisted in S’s running directly to the platform 
on which it had previously been fed. The plat- 














Diagram of experimental room. 


forms on which Ss were placed for the test were 
chosen so that the correct response for half the 
rats (randomly chosen) would be a right turn 
and for the other half a left turn. No food was 
present on the apparatus during this phase. 


REsULTs 


Of the 24 Ss, all left the platform 
on which they were placed and 17 
made the correct response on the test. 
Three of the 24 Ss, however, had never 
left the platform on which they were 
placed during the familiarization 
phase of the experiment, and had no 
opportunity to learn the layout of the 
apparatus. When these three non- 
exploring rats are eliminated from 
consideration, 16 of the remaining 21 
rats made the correct response on the 
test. Correct responses were about 
equally distributed among Ss fed on 
the three platforms: 5 correct re- 
sponses for the 7 Ss fed on Platform 





LATENT LEARNING 


A, 6 for the 8 fed on B, and 5 for the 
6 fed on C. 

There are two methods by which 
we can assign a P value for chance 
expectation of a correct choice on the 
test. Since S could turn either right 
or left at the bifurcation, we can 
compute the significance of the dif- 
ference with chance as P = .50. 
Including only the 21 exploring rats, 
t = 2.402, significant at the .05 level. 

An alternative method of assigning 
chance probability takes any position 
preferences into account. If we take 
the proportion of right turns made 
during the familiarization phase by 
Ss for whom a right turn was correct 
on the test run, and the proportion of 
left turns made during the familiari- 
zation phase by Ss for whom a left 
turn was correct on the test run, this 
total proportion can be considered the 
probability of a correct turn on the 
test run on the basis of preferences 
evidenced during the familiarization 
Thus the proportion of turns 


phase. 
the rats made during the familiari- 
zation phase in the direction of the 
correct response on the test was com- 
puted and taken as the chance proba- 
bility of a correct turn (P = .489). 
Taking this P value and including 


289 


only the 21 exploring rats, t = 2.504, 
also significant at the .05 level. 


We would seem to be justified in 
concluding that the rats had learned 
something about the layout of the ap- 
paratus during the familiarization phase 
of the experiment, when no obvious 
reinforcers were present in the situation. 


SUMMARY 


Twenty-four rats were allowed to explore an 
apparatus similar to that used by Maier in his 
three-table reasoning problem, were then fed on 
one of the three platforms, and were tested to 
see if they could return directly to the platform 
on which they were fed. On the first test trial, 
16 of 21 rats (3 rats did not explore during the 
familiarization phase) made the correct turn 
to the platform on which they had been fed 
(P = 05). No specific source of reinforcement 
during the familiarization phase could be readily 
identified. 

REFERENCES 

1. Maser, N. R. F. The effect of cerebral 
destruction on reasoning and learning in 
rats. J. comp. Neurol., 1932, $4, 45-75. 

2. Marer, N. R. F. In defense of reasoning in 
rats: areply. J. comp. Prychol., 1935, 
19, 197-206. 

3. Sewarp, J. P. An experimental analysis of 
latent learning. J. exp. Prychol., 1949, 
39, 177-186. 

4. Twisttetuwaire, D.L. A critical review of 
latent learning and related experiments. 


Psychol. Bull., 1951, 48, 97-129. 
(Received May 26, 1955) 





Journal 4, Experimental Psychology 
Vol. 51, No. 4, 1956 


NOTE ON “WISHING WITH DICE” 


LLOYD G. HUMPHREYS ' 
Personnel Research Laboratory, AF Personnel ®$ Training Research Center 


The recent experiment by McConnell, 
Snowden, and Powell (3) dealing with a 
psychokinetic phenomenon was appar- 
ently very carefully controlled. There 
are, however, certain aspects of the 
results which bother the writer and leave 
him with a feeling he would like to see 
an independent replication of the ex- 
periment. 

The data can be analyzed, by means 
of the analysis of variance, into com- 
ponents comparable to those discussed 
in the original paper. Thus there are 
three data pages, two halves to each 
page, and three data blocks within each 
half. Since these divisions are con- 
sidered meaningful by the authors, the 
analysis which follows should not be 
considered arbitrary. 

The several components are listed in 
Table 1 under source of variation, along 
with their interactions and the within- 
groups or error term. The sums of 
squares were computed in the usual 
fashion with the exception that the total, 
from which the within sum of squares 
was obtained by subtraction, was com- 
puted from the relationship, 2X? 
= ZX, which holds when scores are 
either zero or one. 

It is acknowledged that it is somewhat 
unusual to analyze the variance of 
dichotomous data. In the present case 
there is no problem with respect to 
homogeneity of variance since all hit- 
ratios, in spite of differences in experi- 
mental conditions, are close to 1/6. 
With respect to the form of the distri- 
bution, Lindquist concludes (2, p. 81) 
from Norton’s investigation (4) that 
“|... the F-distribution is amazingly 
insensitive to the form of the distribution 
of criterion measures in the parent 


' Visiting Professor, University of Illinois, 
fall semester, 1955; on leave from Personnel 
Research Laboratory, Air Force Personnel and 
Training Research Center, when this note was 
pro 


population ....” One precedent also is 
the widespread use of the Kuder- 
Richardson coefficient of homogeneity, 
which is based on an analysis of variance 
model. Somewhat more suitable data 
for the application of the analysis of 
variance could be obtained from the 
present experimental design by grouping 
the successive trials for each S§ in each 
data block. This would give a score 
with a possible range from zero to 24, 
but published data do not allow these 
summations to be made. 


TABLE 1 


Awnatysis or Variance or Hits ror Eacnu 
Data-Biock Position or tue Turee 
Data Paces 





Mean F 


Source of Variation Square 





3.45 
3.82° 
57.71%* 
51.30°* 
1.45 
1.37 


Halves of page 
Data-block positions 
Pages 

toes x blocks 
Halves X pages 

Blocks X pages 

Blocks X pages X halves 
Total between 

Within (error) 

Total 


= i 
NS Se NNN N 


169,758]. 
169,775 














* P is less than .05 for a difference in the expected 
direction. 

 P is less than .05, but for a difference in the 
unexpected direction, Le., these interaction variances 
are smaller than chance expectation. 


If one were to follow the procedure 
usually recommended in the analysis of 
similar data, only one F ratio would be 


computed in Table 1. This is the value 
of 1.37 for the “total between,” which 
tells one that there is nothing further to 
test. Picking out special variances for 
further test usually involves capitali- 
zation on chance differences. It must 
be admitted in the present case, however, 
that there is some precedent for selecting 
one particular source of variation for test. 
Data block, which is the one in question, 


290 





NOTE ON “WISHING WITH DICE” 


TABLE 2 


Two-Way Tastes OpsTainep From 
Pustisnep Data or McConne tt, 
Snowpon, anp Powe. 


Data Block 


Halves of 


Total 


Halves of 


Page Total 


l : 14,245 
13,960 


2 4,643 


Total 





9,390 
Pages 


Data 
Blocks 


Total 

9,589 
9,368 
9248 


3.136 
3,074 
3,100 


3,180 
3,117 
3,093 
Total 9,310 


9,390 28,205 











makes a contribution to variance sig- 
nificant at the .05 level. 

But there are two additional, unan- 
ticipated significant F ratios in the table! 
Two interactions involving halves of the 
page, which signified a break in the 
experimental procedure, are smaller than 
one would expect to obtain by chance. 
The almost complete proportionality of 
the cell totals to row and column totals 
is observed in the upper two-thirds of 
Table 2. 

There are several alternative ex- 
planations for variances that are too 
small: namely, chance; systematic error 
in the experimental, data-recording, or 
data-analysis procedures; and use of the 
wrong error term. 

Chance can never be ruled out as a 
possibility no matter what level of 
significance is accepted. One should not 
accept differences, however, in the ex- 
pected direction at a given level of 


291 


significance as nonchance and reject 
differences in the unexpected direction 
at the same significance level as chance. 
If this rule is followed, there are no 
positive results in the experiment. 

The second alternative, it should be 
noted, assumes systematic not random 
error. One of the interactions in ques- 
tion involved the one variable (data- 
block position) whose main effects were 
significant; both of the questionable 
interactions involved a variable (halves 
of page) whose main effects were ex- 
pected to be significant. Systematic 
error could result from expectation of 
differences. Considering the controls 
present, this may not seem to be a 
plausible explanation, but it would take 
only a small amount of error to account 
for the very small differences observed. 

The third alternative is not probable 
in the present experiment. This con- 
clusion is based upon inferences drawn 
from the characteristics of psychological 
data. The reasoning starts with a 
breakdown of degrees of freedom for the 
error term of Table 1. This analysis is 
presented in Table 3. 

In the usual psychological experiment 
these various components would not be 
homogeneous. For example, all depend- 
ent variables measured with nonzero 
reliability exhibit significant individual 
differences; i.e., the ratio of the between- 
Ss variance to the within-variance will 
be greater than unity. It is from this 
ratio that a reliability coefficient can be 


TABLE 3 


Breakpown or THe Wirnin-Grours 
Variance or Tasie | 


Source of Variation df 


Between Ss 392 
Halves K Ss 392 
Blocks X Ss 7¢4 
Pages X Ss 784 
Halves * blocks & Ss 784 
Halves X pages X Ss 784 
Blocks X pages X Ss 1568 
Blocks X pages & halves K Ss 1568 
Within (393 « 18 & 23) 162,702 


Total 169,758 











292 


secured. Furthermore, interactions in- 
volving Ss and experimental variables 
will always be as large or larger than the 
within-variance, since variation of ex- 
perimental conditions tends to lower 
intertrial correlations. On the other 
hand, these same interactions will gen- 
erally be smaller than the between-Ss 
variance, indicating positive intertrial 
correlations, though experimental con- 
ditions can at times produce negative 
correlations, e.g., during reversal of a 
discrimination. In the present case the 
appropriate error terms for the two 
questionable interactions are halves X 
blocks X Ss, and halves X pages X Ss, 
but it is inconceivable that these could be 
sufficiently small to account for the two 
unexpectedly small interactions. 

As a matter of fact, the various error 
variances in Table 3 may well be 


homogeneous as McConnell, Snowden, 
and Powell in effect assumed in com- 
puting their tests of significance? If 


* It should be noted that there is no error in 
multiplying Ss times observations in estimating 
df available for error if each observation is 
independent of all the rest. Thus one of 
Burdock’s criticisms (1) of Schmeidler (5) does 
not necessarily hold and is always subject to 
experimental test. However, the imperative is 
on the original author to justify this procedure; 
furthermore, it should be done explicitly. 


LLOYD G. HUMPHREYS 


true, this suggests another difficulty with 
their findings. Lack of significant in- 
dividual differences when experimental 
conditions are held constant is an indirect 
though conclusive indication that a 
phenomenon is not genuine psychologi- 
cally. That is, if regularities observed 
in the data are not associated with Ss of 
the experiment, it also supports the 
conjecture that there is present some 
consistent, small experimental, data- 
recording, or data-analysis bias. 


REFERENCES 


. Burpock, E. I. A case of ESP: critique of 
“personal values and ESP scores” by 
Gertrude R. Schmeidler. J. abnorm. soc. 
Psychol., 1954, 49, 314-315. 

. Linpquist, E. F. Design and analysis of 
experiments in psychology education. 
Boston: Houghlin Mifflin, 1953. 

. McConnetr, R. A., Snowpon, R. J., & 
Powewn, K. F. Wishing with dice. /. 
exp. Psychol., 1955, $0, 269-275. 

. Norton, D. W. An empirical investigation 
of some effects of non-normality and 
heterogeneity in the F-distribution. 
Unpublished doctor's thesis, State 
Univ. of Iowa, 1952. 

. Scumerpter, G. R. Personal values and 
ESP scores. J. abnorm. soc. Psychol, 
1952, 47, 757-761. 


(Received December 7, 1955) 











